Senior Engineer, Data
Job Requirements
- Understand and Analyze data from multiple data sources and develop technology to integrate the enterprise data layer
- Create robust and automated pipelines to ingest and process structured and unstructured data from source systems into analytical platforms using batch and streaming mechanisms leveraging cloud native toolset
- Work activity includes processing complex data sets, leveraging technologies used to process these disparate data sets and understanding the correlations as well as patterns that exist between these different data sets
- Implement orchestrations of data pipelines and environment using Airflow
- Implement custom applications using the Kinesis, Lambda and other AWS toolset as required to address streaming use cases
- Implement automation to optimize data platform compute and storage resources
- Develop and enhance end to end monitoring capability of cloud data platforms
- Participate in educating and cross training other team members
- Provide regular updates to all relevant stakeholders
- Participate in daily scrum calls and provide clear visibility to work products
Job Requirements
- BS in Computer Science or related field
- 6+ years of experience in the data engineering and analytic space
- 5+ years of Python experience. Solid programing experience in Python - needs to be an expert in this 4/5 level. (Must have strong Python skills, along with lambdas and Airflow Dag processing.)
- 5+ years of RDBMS concepts with strong data analysis and SQL experience
- 4+ years of Linux OS command line tools and bash scripting proficiency
- 1+ year of experience working on Big Data Processing Frameworks and Tools
- Exposure to software engineering such as parallel data processing, data flows, REST
APIs, JSON, XML, and micro service architectures
- Certification –preferably AWS Certified Big Data or any other cloud data platforms, big data platforms
Nice to have:
- Kubernetes and Docker experience a plus
- Prior working experience on data science work bench
- Knowledge of machine learning pipelines (e.g., train/test splitting, scoring process, etc.)