Job Description
<p><strong>Purpose of the job</strong></p><p>As a data engineer you’ll work as s a Big Data Developer, you will be working on Data Ingestion activities to bring large volumes of data into our Big Data Lake. The candidate will play a vital role in building new data pipelines from various structured and unstructured sources into Hadoop. You will be working closely with data consumers and source owners to create the foundation for data analytics and machine learning activities.</p><p><strong>Duties and responsibilities</strong></p><ul><li>Identifying data ingestion patterns and build framework to efficiently ingest the data to our Data Lake</li><li>Performance tuning ingestion jobs to improve throughput</li><li>Improving CI/CD process by automating build, test and deployment framework</li><li>Build high-performance algorithms, prototypes, and proof of concepts</li><li>Research opportunities for data acquisition and new uses for existing data</li><li>Develop data set processes for data modeling, mining and production</li><li>Integrate new data management technologies and software engineering tools into existing framework</li><li>Work with In-memory database tools (Redis, Riak)</li><li>Collaborate with data architects, modelers and IT team members on project goals</li></ul><p><strong>Job specification</strong></p><p><strong>Education</strong></p><ul><li>Bachelor degree of Engineering in Computer Systems, or Computer Science.</li></ul><p><strong>Experience</strong></p><ul><li>6+ years data engineering experience building data pipelines and systems.</li><li>Experience in working with Hadoop technologies such as Spark</li><li>Experience in working with data flow tools such as Nifi, Airflow,</li><li>Prior experience in working with Cloud technologies such as GCP, AWS, is a plus</li><li>Prior experience with relational databases such as MYSQL</li><li>Knowledge and understanding of SDLC and Agile/Scrum procedures, CI/CD and Automation is required</li><li>Ability to use containers like dockers or kubernettes is a plus</li><li>Ability to write SQL queries and use tools such as Hadoop, Tableau, QlikView, and other data reporting tools. Experience in transactional and data warehouse environments using MySQL, Hive, or other database systems. Must deeply understand joins, subqueries, window functions, etc.</li><li>Strong background in designing relational databases like Postgres and NoSQL database like Mongo DB or Cassandra</li></ul><p><strong>Skills and abilities</strong></p><ul><li>Strong ability to drive complex technical solutions deployed at an enterprise level; ability to drive big data technology adoption and changes through education and partnership with stakeholders</li><li>Demonstrated experience in working with the vendor(s) and user communities to research and test new technologies to enhance the technical capabilities of existing Hadoop cluster</li><li>Ability to negotiate, resolve, and prioritize complex issues and provide explanations and information to others on difficult issues, assess alternatives and implement long-term solutions</li><li>Self-starter who can work with minimal guidance</li><li>Strong communication skills</li><li>Very good English both written & spoken.</li></ul><p></p>