Database Reliability Engineer (DBRE), Cloud Platform

Apply for this job

Email *

Job Description

Leading the future in luxury electric and mobility
At Lucid, we set out to introduce the most captivating, luxury electric vehicles that elevate the human experience and transcend the perceived limitations of space, performance, and intelligence. Vehicles that are intuitive, liberating, and designed for the future of mobility.
We plan to lead in this new era of luxury electric by returning to the fundamentals of great design – where every decision we make is in service of the individual and environment. Because when you are no longer bound by convention, you are free to define your own experience.
Come work alongside some of the most accomplished minds in the industry. Beyond providing competitive salaries, we’re providing a community for innovators who want to make an immediate and significant impact. If you are driven to create a better, more sustainable future, then this is the right place for you.
Summary:
The Cloud Platform team at Lucid is currently seeking a Database Reliability Engineering. In this position, the individual will be responsible for providing reliability of the various Databases offered as services on several public and private Cloud Infrastructures.
Our ideal candidate exhibits a can-do attitude and approaches the work with vigor and determination. We are looking for a hands-on Software Engineering who will collaborate with other engineers to build, automate, and maintain various database solutions provided by the Cloud Platform and keep up with the SLAs.

Responsibilities

  • Provide Reliability Engineering to various cloud-based databases and services
  • Responsible for maintaining and keeping the uptime, scalability, and availability of several types of Databases that include but not limited to Postgres, MySQL, TimescaleDB, InfluxDB, Cassandra, MemCache, and MongoDB
  • Provide the reliability of services such as Prometheus, Grafana, ElasticSearch with Logstash, Vector or FluentD, and several other services
  • Build HA big-data systems and can manage highly scalable databases that can handle a high volume of data ingestion with low latent query responses
  • Continuous delivery (CI/CD) using ArgoCD, Jenkins, Maven, Artifactory, Docker
  • Build tools and frameworks to automate the monitoring systems to ensure the highest level of uptime in various production-grade environments
  • Setting up infrastructure as a service using Terraform
  • Operate and Setup code repository with GitLab or Bitbucket
  • Deploy, configure, and maintain tools such as Kafka, Spark, Presto, Airflow, MQTT, and Microservices
  • Collaborate with Service Owners to define SLOs, build SLIs, and ensure the Database services meet the SLA
  • Use Incident management processes and always look for improving operational efficiency
  • Swiftly navigate through the incident, perform the impact analysis, and take appropriate actions
  • Create a 24/7 service reliability model to proactive monitor the systems across geographical locations
  • Participate in on-call rotation to keep up the service SLA per the business needs
  • Understands customer impact and can prioritize the workload between features development and customer support

Qualifications:

  • B.S. or M.S. degree in Computer Science, Engineering, OR equivalent work experience.
  • Can speak English Fluently to communicate with teams across geographical regions.
  • 5+ years of experience in SRE or DevOps Engineering.
  • 1-3 years of experience in the management and operational administration of databases such as Postgres, Cassandra, InfluxDB, MongoDB, ElasticSearch, Prometheus, or other similar databases
  • 1-3 years of experience deploying and maintaining applications that are built using Docker and orchestrated on Kubernetes on Public or Private Cloud Providers
  • 1-3 years of experience using Cloud Automation tools such as Terraform, Pulumi, Cluster API, or other frameworks
  • 1-3 years of experience in Programing or scripting languages using Python, Go, Bash/Shell, or others
  • 1-3 years of experience with tools such as Jenkins, ArgoCD, Artifactory, etc., to build automation, CI/CD, Self-Service pipelines
  • Experienced with running large-scale distributed computing infrastructure for running Data Platforms using Spark, Hive, Presto, Zookeeper, and Kafka
  • Experienced with various debugging tools and troubleshooting performance bottlenecks at the infrastructure or the application tier.
  • Good to have experience with Config Management and automation using Ansible, Chef, Puppet, or others.
  • Good to know about REST-based APIs and knows how to triage the request-response
  • Possess the traits of being detail-oriented, time management, collaborative, and dedicated to quality.