Data Engineer
- BS in Computer Science or related field
- 4+ years of experience in the data and analytics space
- Certification –preferably AWS Certified Big Data or any other cloud data platforms, big data platforms
- 2+ years of experience developing and implementing enterprise-level data solutions utilizing Python, Java, Spark, and Scala, Airflow, Hive
- 2+ years in key aspects of software engineering such as parallel data processing, data flows, REST APIs, JSON and micro service architectures
- 2+ year of experience working on Big Data Processing Frameworks and Tools – Map Reduce, YARN, Hive, Pig, Oozie, Sqoop, and good knowledge of common big data file formats (e.g., Parquet, ORC, etc.)
- 4+ years of RDBMS concepts with strong data analysis and SQL experience
- 3+ years of Linux OS command line tools and bash scripting proficiency
- Solid programing experience in Python - needs to be an expert in this 4/5 level.
Knowledge, Skills and Abilities:
- A passion for technology and data analytics with a strong desire to constantly be learning and honing skills
- Ability to work in a team environment
- Flexibility to work in matrix reporting structure
- Strong understanding of Hadoop fundamentals with experience working on Big Data Processing Frameworks and Tools – Map Reduce, YARN, Hive, Pig, Oozie, Sqoop, and good knowledge of common big data file formats (e.g., Parquet, ORC, etc.)
- Develop large scale event based streaming architectures
- Strong communication and documentation skills
- Mentor other team members and participate in cross training
- Working knowledge of NoSQL, in-memory databases
- Working knowledge of developing data ingestion and data transformation capabilities using Hive, Python, Spark and Scala, Airflow
- Background in all aspects of software engineering with strong skills in parallel data processing, data flows, REST APIs, JSON, XML, and micro service architecture.
- Solid programing experience in Python - needs to be an expert in this 4/5 level
- Experience working in a scrum/agile environment and associated tools (Jira)
- Experience with large data sets and associated job performance tuning and troubleshooting
- Able to collaborate with cross-functional IT teams and global delivery teams
- Kubernetes and Docker experience a plus
- Prior working experience on data science work bench
- Cloud data warehouse experience - Snowflake is a plus
- Data Modeling experience a plus
- Knowledge of data engineering aspects within machine learning pipelines (e.g., train/test splitting, scoring process, etc.)