Job Description
Data Management and Governance section has ingested 20 TB of transaction data for the services provided<p><br></p>More then 200+ data models have been exposed for reporting and business intelligence and Artificial Intelligence.<p><br></p>These are the below objectives:<p><br></p><ul><li>Data Acquisition</li><li>Vendor should manage the existing Data pipelines built for data ingestion.</li><li>Create and manage new data pipelines following the best practises for the new ingestion of data.</li><li>Continuously monitor the data ingestion through Change Data Capture for the incremental load</li><li>Any failed batch job schedule to be analysed and fixed to capture the data</li><li>Maintaining and continuously updating on the technical documentation of the ingested data and maintaining the centralized data dictionary, with necessary data classifications.</li><li>Extraction and Cleaning</li><li>Extraction of data from the data sources to be cleaned and ingested into big data platform</li><li>Automation of data cleaning has to be defined before ingestions</li><li>Data cleaning to handle the missing data and remove any outliers and resolve any inconsistencies</li><li>Data quality check has to be performed in terms of accuracy, completeness, consistency, timeliness, believability, and interpretability</li><li>Data Integration, Aggregation and Representation</li><li>Exposing of Data views or Data models to Reporting and source systems using Hive or Impala, or similar tools provided by us</li><li>Exposing of cleansed data to Artificial Intelligence team for building data science models</li><li>Informatica Data Catalog</li><li>Implement and configure the Informatica Enterprise Data Catalog (EDC) solution to discover and catalog data assets across the organization.</li><li>Develop and maintain custom metadata scanners, resource configurations, and lineage extraction processes.</li><li>Integrate EDC with other Informatica tools, such as Data Quality (IDQ), Master Data Management (MDM), and Axon Data Governance.</li><li>Define and implement data classification, data profiling, and data quality rules to improve data visibility, accuracy, and trustworthiness.</li><li>Collaborate with data stewards, data owners, and data governance teams to identify, document, and maintain business glossaries, data dictionaries, and data lineage information.</li><li>Establish and maintain data governance policies, standards, and procedures within the EDC environment.</li><li>Monitor and troubleshoot EDC performance issues, ensuring optimal performance and data availability.</li><li>Train and support end-users in effectively utilizing the data catalog for data discovery and analysis.</li><li>Keep up to date with industry best practices and trends, continuously improving the organization’s data catalog implementation.</li><li>Collaborate with cross-functional teams to drive data catalog adoption and ensure data governance compliance across the organization.<br></li></ul><p><br></p><strong>Skill Set:</strong><p><br></p><ul><li>Certified Big Data Engineer from Cloudera/AWS/Azure</li><li>Expertise with Big data products Cloudera stack</li><li>Expertise in Big Data querying tools, such as Hive, Hbase and Impala.</li><li>Expertise in SQL, writing complex queries/views, partitions, bucketing</li><li>Strong Experience in Spark using Python/Scala</li><li>Expertise in messaging systems, such as Kafka or RabbitMQ</li><li>Hands on experience in Management of Hadoop cluster with all included services.</li><li>Implementing ETL process using Sqoop/Spark</li><li>Implementation including loading from disparate data sets, Pre-processing using Hive.</li><li>Ability to design solutions independently based on high-level architecture.</li><li>Collaborate with other development teams</li><li>Expertise in building stream-processing systems, using solutions such as Spark-Streaming, Apache NIFI, KAFKA</li><li>Expertise with NoSQL databases such as HBase</li><li>Experience with Informatica Enterprise Data Catalog (EDC) implementation and administration.</li><li>Strong knowledge of data management, data governance, and metadata management concepts.</li><li>Proficiency in SQL and experience with various databases (e.g., Oracle, SQL Server, PostgreSQL) and data formats (e.g., XML, JSON, CSV).</li><li>Experience with data integration, ETL/ELT processes, and Informatica Data Integration.</li><li>Familiarity with data quality and data profiling tools, such as Informatica Data Quality (IDQ).<br></li></ul>