Aug 2018
Present
Data Engineer
Tata Consultancy services
Kolkata, IN
• Developed a Data Flow in spark using Scala that process application logs in batch mode and referenced to reconcile with the production data.
• Scheduled Spark job using oozie, developed workflow modeling to control data process flow .
• Extensively worked with team to tune the spark job ,Troubleshoot errors using Resource Manager,Hue,Autosys log and Fix them
• Developed around 100 Sqoop import and export jobs for extracting data from Oracle, SQL server, Netezza, MySQL into HDFS and populating in impala tables and vice versa.
• Working on a POC which involves Kafka Data streaming of frequent transnational data which will be saved into hdfs in batch mode as well as update of infrequent row in real time . We are working on kafka consumer to receive and store the data.
• Implemented Kerberos authentication to secure the cluster. Added token based authentication in existing module to get connection with impala and other DB from hdfs .
• Worked in a project which involves data migration for specific views of impala to AWS redshift. As there is no direct AWS Glue connectivity we developed Jobs which first copy data from hdfs to s3 and s3 to Redshift via AWS glue job.
• Worked in various AWS service like,athena,EC2,EMR etc
• Scheduled Spark job using oozie, developed workflow modeling to control data process flow .
• Extensively worked with team to tune the spark job ,Troubleshoot errors using Resource Manager,Hue,Autosys log and Fix them
• Developed around 100 Sqoop import and export jobs for extracting data from Oracle, SQL server, Netezza, MySQL into HDFS and populating in impala tables and vice versa.
• Working on a POC which involves Kafka Data streaming of frequent transnational data which will be saved into hdfs in batch mode as well as update of infrequent row in real time . We are working on kafka consumer to receive and store the data.
• Implemented Kerberos authentication to secure the cluster. Added token based authentication in existing module to get connection with impala and other DB from hdfs .
• Worked in a project which involves data migration for specific views of impala to AWS redshift. As there is no direct AWS Glue connectivity we developed Jobs which first copy data from hdfs to s3 and s3 to Redshift via AWS glue job.
• Worked in various AWS service like,athena,EC2,EMR etc