Principal Engineer, Data Engineering
Responsibilities
- Architect, design, develop and engineering end-to-end data pipelines across multiple data sources and systems of record.
- Ensure data quality, integrity, security, and completeness throughout the data lifecycle.
- Develop, design data models, data structures and ETL jobs for data acquisition and manipulation purposes.
- Develop deep understanding of the data sources, implement data standards, maintain data quality, and master data management.
- Manage and maintain cloud-based data and analytics platform
- Deep understanding of the cloud offerings and engage in quick proof of concepts and proof of value in prototyping data and analytics solutions and derive viability
- Ability to interact with the business stakeholders to understand requirements and translating into technology solutions
Requirements
- Experience in Cloud platform AWS eco-system.
- Data Engineering/Development experience with SQL (Snowflake, Oracle, SQL Server, MySQL).
- Strong development background creating pipelines and complex data transformations and manipulations using one of the languages Python, Java, R, or Scala.
- Experience in NoSQL Databases and Big data technologies including Hadoop, MongoDB, Cassandra.
- Experience with API / RESTful data services.
- Worked on real-time data capture, processing and storing using technologies like Kafka, AWS Kinesis.
- Understanding of different data formats including Parquet, Avro, CSV, ORC etc.
- Past working experience on a fast paced and agile environment
- Perform ongoing monitoring, automation, and refinement of data engineering solutions
- Experience in leading high visibility transformation projects that interacts with multiple business lines
- Experience working with an on-shore / off-shore model that consists of Data architects and Visualization Engineers
- Collaborate and communicate with key business lines, technology partners, vendors, and architects
Qualifications
- BS in Computer Science or related field
- 10+ years of experience in the data and analytics space
- Certification –preferably AWS Certified Big Data or any other cloud data platforms, big data platforms
- 6+ year’s experience developing and implementing enterprise-level data solutions utilizing Python (Scikit-lean, Scipy, Pandas, Numpy, Tensorflow) , Java, Spark, and Scala, Airflow , Hive and Python.
- 4+ years in key aspects of software engineering such as parallel data processing, data flows, REST APIs, JSON, XML, and micro service architectures.
- 4+ year of experience working on Big Data Processing Frameworks and Tools – Map Reduce, YARN, Hive, Pig, Oozie, Sqoop, and good knowledge of common big data file formats (e.g., Parquet, ORC, etc.)
- 6+ years of RDBMS concepts with Strong Data analysis and SQL experience
- 6+ years of Linux OS command line tools and bash scripting proficiency
Knowledge, Skills and Abilities:
- A passion for technology and data analytics with a strong desire to constantly be learning and honing skills
- Ability to deliver independently without oversight
- Be productive even with ambiguity and highly fluid requirements during initial stages of projects
- Flexibility to work in matrix reporting structure
- Experienced in implementing large scale event based streaming architectures
- Strong communication and documentation skills
- Working knowledge of NoSQL, in-memory databases
- Background in all aspects of software engineering with strong skills in parallel data processing, data flows, REST APIs, JSON, XML, and micro service architecture
- Experienced in collaborating with cross-functional IT teams and global delivery teams
- Solid Programing experience in Python - needs to be an expert in this 4/5 level
- Working knowledge of data engineering aspects within machine learning pipelines (e.g., train/test splitting, scoring process, etc.)
- Experience working in a scrum/agile environment and associated tools (Jira)