Job Description
Job description / Role
Job Description
– HPC cluster systems engineer is responsible for managing and supporting all HPC systems and Grid system, for the University data center and distributed locations.
– Solves HPC and Grid related problems on a daily basis.
– In support of change management within the data center, provides the CSC with information about the HPC systems.
– Daily verifies all HPC Systems by using the monitoring tools and proactively intervenes to solve problems.
– Analyze solutions components, understand systems integration challenges and identify technology gaps.
– Resolve / propose solutions to above gaps to reach future performance targets and functionality requirements.
– Prototype features and perform integration checkout of various software components, and collaborate with component developers and solutions architects.
– Develop / drive validation test content and evaluate systems components.
– Engage with industry partners as required to identify and investigate best-known methods used in the HPC community and apply those methods.
– Collaborate with architects and developers to define architectural requirements for high-end HPC clusters.
– Responsible for system integration and validation of UAEU HPC clusters.
– Responsible of monitoring all HPC and Grid services.
– Coordinates work with vendors for support.
– Tests and deploys HPC systems.
– Knowledge of IT Service Management frameworks.
– Maintains accurate and comprehensive documentation diagrams of the enterprise HPC system, backup infrastructure, communications flow, and routing.
– Other duties as assigned.