SRE Engineer

Responsibilities and Duties

  • Design, implement, and maintain AWS infrastructures using Terraform for Infrastructure as Code (IaC) management.
  • Develop and manage serverless architectures, optimizing performance and scalability.
  • Configure and manage distributed, event-driven systems using Apache Kafka.
  • Monitor and ensure high availability and reliability of production services and applications.
  • Implement DevOps and SRE practices to improve deployment and operational processes.
  • Automate operational and deployment processes to minimize downtime and human errors.
  • Collaborate with development teams to ensure applications are scalable and operable.
  • Identify and resolve performance and reliability issues in production environments.
  • Develop and maintain technical documentation and disaster recovery procedures.
  • Stay up to date with the latest cloud operations and SRE trends and technologies.

Required Experience, Skills, and Qualifications

Education:

  • Bachelor’s degree in Business, Computer Science, Engineering, or a related field.

Experience:

  • Minimum of 2 to 4 years of experience in SRE, DevOps, or Systems Engineering roles.
  • Proven experience in implementing and managing AWS infrastructures using Terraform.
  • Experience in automating deployments and managing large-scale infrastructures.

Skills and Competencies:

  • Strong experience with AWS, including services such as Lambda, Fargate, S3, RDS, SNS, SQS, MSK, Kinesis, among others.
  • Proficiency in Terraform for Infrastructure as Code management.
  • Deep understanding of serverless architectures and best practices.
  • Experience with Apache Kafka for distributed, event-driven systems (preferred).
  • Familiarity with containers and orchestration tools like Docker and Kubernetes.
  • Knowledge of monitoring and logging tools such as Prometheus, Grafana, Jaeger, Loki, etc.
  • Understanding of DevOps practices and CI/CD tools.
  • Knowledge of networking protocols, security, and best practices in the cloud.
  • Excellent problem-solving skills and analytical thinking.
  • Ability to work in high-pressure environments and effectively respond to incidents.
  • Strong communication and collaboration skills with cross-functional teams.
  • Attention to detail and a continuous improvement mindset.
  • Proactivity and adaptability in dynamic environments.

Please also send a copy of your resume to hr@liberintechnologies.com!