- Spearheading a team of 4 data engineers handling all the Data Ingestion requirements for a Pharmaceutical client enabling data driven decision making on their clinical trials.
- Developed a generic ETL framework which is capable of ingesting data from disparate data sources (JDBC/Rest API/Sharepoint) to s3 datalake driven by configurations in metastore (MySQL RDS) and Airflow variables.
- Addition of new sources was completely driven configurations in RDS and Airflow with zero code change leveraging AWS cloud services and Spark for implementation increasing the bandwidth of the team by a great margin.
- Spearheaded the development of a dynamic Airflow DAG creation framework which was completely driven by configurations captured in Airflow variables and MySQL RDS(metastore) with zero coding.
- Designed a generic framework to implement transient EMR to balance the data loads which enabled dynamic cluster creation on demand by just updating cluster config variables.
©2024 CraftmyCV.com. All Rights Reserved