- Implemention of Spark using Scala for faster testing and processing of data into Impala db.
- Parsing, splitting the raw files using Java and Spark with Scala, and convert them to parquet inserting into impala db, that are coming from all over the globe.
- Loading data into Hadoop cluster from multiple existing regular database systems.
- Good understanding of Partitioning concepts and different file formats supported in Impala.
- Maintenance, installation and upgrading Hadoop components like HDFS , Yarn, spark , Zookeeper, Impala, Oozie, Sqoop and arcadia on the Hadoop cluster.
Bachelor’s degree in Computer Science or a closely related field