Demonstrated history of evaluating systems, identifying improvement opportunities, and developing innovative processes aimed at optimizing the flow and storage of information, expanding capabilities, and enhancing user experience. Track record of effective collaboration with cross-functional teams, stakeholders, and users to align data strategies and solutions with business requirements.  Established ability to design, deploy, and maintain Big Data, MSSQL, Oracle, NoSQL infrastructures supporting large volume, complex data transactions. Equally comfortable in team environments and independently driven roles.Big Data Integration/CDP Data Lake/Relational & NoSQL Database Design / Cross-Platform Programming / Data Modeling / Software Development Life Cycle/ Cloud Services/Process Optimization & Automation 

Key Skills

AWS
Python
PySpark
Airflow
SQL
ETL
Software development lifecycle

Professional Experience

Aug 2020
Present
Lead Data Engineer
Ernst & Young Bengaluru, IN
  • Spearheading a team of 4 data engineers handling all the Data Ingestion requirements for a Pharmaceutical client enabling data driven decision making on their clinical trials. 
  • Developed a generic ETL framework which is capable of ingesting data from disparate data sources (JDBC/Rest API/Sharepoint) to s3 datalake driven by configurations in metastore (MySQL RDS) and Airflow variables. 
  • Addition of new sources was completely driven configurations in RDS and Airflow with zero code change leveraging AWS cloud services and Spark for implementation increasing the bandwidth of the team by a great margin.
  • Spearheaded the development of a dynamic Airflow DAG creation framework which was completely driven by configurations captured in Airflow variables and MySQL RDS(metastore) with zero coding.
  • Designed a generic framework to implement transient EMR to balance the data loads which enabled dynamic cluster creation on demand by just updating cluster config variables.
Apr 2019
Sep 2020
Senior Data Engineer
Quaero Bengaluru, IN
  • Developed a robust Data Lake on Hive with HDFS mounted on AWS S3 for one of the pioneer OTT platform clients which provisioned seamless data flow from different data sources providing 360 unified customer view enabling Data science, Analytics and Marketing teams to make profound business decisions. 
  • Designed a new generic class of workflow packages using Python script which can extract data from any external DB to any file system (External DBs like MySQL, SQL Server, Mongo DB File Systems: S3, UNC, SFTP). 
  • Successfully migrated on premise Hive database with HDFS to Snowflake on AWS cloud mounted on S3 by designing PySpark scripts on Zeppelin Notebook
  • Designed flattening mechanism using Pyspark to flatten json format files into tabular structure before having them staged to Hive tables. 
  • Designed data pipelines using PySpark on EMR cluster to automate data operations to load data from various file formats to Hive stored as Parquet files on S3 storage which in turn consumed by Sisense (BI tool) for visualization. 
  • Designed and automated EMR Cluster creation and termination scripts in Python along with Bootstrapping actions to install modules during cluster creation reducing AWS billing cost by 20%. 
  • Implemented Autoscaling policy on EMR cluster using Python to scale up and down to spawn Spot Instances instead of OnDemand based on the volume of data loads which improved the efficiency of workflows by 35%. 
  • Setup Cloudwatch alarms and SNS alerts for EMR cluster and S3 storage monitoring helping us improve clusters efficiency by significant margin. 

Oct 2017
May 2019
Database Engineer
Moxie Marketing Services LLC Bengaluru, IN
Data Driven Digital-first advertising and CRM agency
  • Transitioned over one-hundred custom legacy surround code routines to SSIS packages and SSRS reports. 
  • Automated deployment of millions of email programs for Verizon CRM developing SSIS packages and stored procedures to perform standard ETL process minimizing manual intervention, ensuring high productivity and quality of the framework
  • Completely authored, redesigned and automated the process of producing Parquet files by consuming the log files present in S3 bucket using Crawlers and AWS Glue which proved to be a steppingstone project to get more business
  • Automated the ETL process using Airflow Python writing DAGs'(Directed Acyclic Graph) to ingest data into and export data out of Google Big Query data sets into Salesforce Marketing Cloud to deploy emails. 
Sep 2012
May 2015
Application Developer
Oracle Financial Services Software Ltd Bengaluru, IN
  • Oracle Automated Testing Suite: Built Oracles first endeavor in automation testing; a milestone project that reduced incident count by 75% and increased efficiency of fixes by 35%. 
  • Oracle FLEXCUBE for Microfinance: Dramatically increased customer base by 15%; created Microfinance module where loans are sanctioned to a group of people. 
  • Supported End-of-Day operations for branches during integration testing and resolved real-time issues throughout implementation. 
  • Developed grace period functionality for Islamic Banks which proved a precursor to Oracles venture into Islamic Banking. 

Education

Jun 2015
Aug 2017
Master's Degree In Computer Science in
University of Alabama in Huntsville
Aug 2008
Apr 2012
Bachelor of Computer Science and Engineering in Vivekananda Institute of Technology
Visveswaraya Technological University

Certifications

2022
Hands On Essentials - Data Warehouse
Snowflake

Hobbies & Interests

  • Badminton
  • Camping
  • Hiking
  • Music

Languages

English
(Fluent)
Hindi
(Fluent)
Kannada
(Native)
Telugu
(Fluent)

Get in touch with Suhas