Responsibilities and Duties
- Design, implement, and maintain AWS infrastructures using Terraform for Infrastructure as Code (IaC) management.
- Develop and manage serverless architectures, optimizing performance and scalability.
- Configure and manage distributed, event-driven systems using Apache Kafka.
- Monitor and ensure high availability and reliability of production services and applications.
- Implement DevOps and SRE practices to improve deployment and operational processes.
- Automate operational and deployment processes to minimize downtime and human errors.
- Collaborate with development teams to ensure applications are scalable and operable.
- Identify and resolve performance and reliability issues in production environments.
- Develop and maintain technical documentation and disaster recovery procedures.
- Stay up to date with the latest cloud operations and SRE trends and technologies.
Required Experience, Skills, and Qualifications
Education:
- Bachelor’s degree in Business, Computer Science, Engineering, or a related field.
Experience:
- Minimum of 2 to 4 years of experience in SRE, DevOps, or Systems Engineering roles.
- Proven experience in implementing and managing AWS infrastructures using Terraform.
- Experience in automating deployments and managing large-scale infrastructures.
Skills and Competencies:
- Strong experience with AWS, including services such as Lambda, Fargate, S3, RDS, SNS, SQS, MSK, Kinesis, among others.
- Proficiency in Terraform for Infrastructure as Code management.
- Deep understanding of serverless architectures and best practices.
- Experience with Apache Kafka for distributed, event-driven systems (preferred).
- Familiarity with containers and orchestration tools like Docker and Kubernetes.
- Knowledge of monitoring and logging tools such as Prometheus, Grafana, Jaeger, Loki, etc.
- Understanding of DevOps practices and CI/CD tools.
- Knowledge of networking protocols, security, and best practices in the cloud.
- Excellent problem-solving skills and analytical thinking.
- Ability to work in high-pressure environments and effectively respond to incidents.
- Strong communication and collaboration skills with cross-functional teams.
- Attention to detail and a continuous improvement mindset.
- Proactivity and adaptability in dynamic environments.