Role Summary
Some engineers deploy code.
Some engineers manage infrastructure.
The best engineers ensure systems run reliably — even when no one is watching.
At Techdome, DevOps is not just about pipelines — it’s about building resilient, scalable, and secure systems that power real-world applications.
As a DevOps Engineer, you will play a critical role in managing production infrastructure during night shift operations. You will ensure system reliability, monitor live environments, respond to incidents, and continuously improve deployment and observability practices.
This role is ideal for someone who thrives in high-responsibility environments, enjoys solving production challenges, and takes ownership of system reliability.
About Techdome
Techdome is a technology-driven engineering organization focused on solving real-world business challenges through modern software engineering, cloud infrastructure, and AI-powered innovation.
Our philosophy:
Build responsibly. Improve continuously. Solve deeply.
At Techdome, engineers are system owners — responsible for reliability, scalability, and performance of production systems.
What This Role Entails
As a DevOps Engineer, you will be responsible for maintaining production stability, improving deployment systems, and ensuring infrastructure observability.
You will:
- Manage and monitor production cloud infrastructure
- Own CI/CD pipelines and deployment processes
- Respond to incidents and ensure quick resolution
- Improve system reliability, monitoring, and alerting
- Work closely with engineering teams for smooth operations
Key Responsibilities
Cloud Infrastructure Management
Design, deploy, and maintain scalable cloud infrastructure with a focus on reliability, security, and performance.
CI/CD Pipeline Management
Build, manage, and optimize CI/CD pipelines to support efficient and reliable deployments across services.
Infrastructure as Code (IaC)
Write and maintain infrastructure using Terraform, ensuring consistency, versioning, and scalability of environments.
Monitoring & Observability
Implement and manage monitoring systems, dashboards, alerts, and logging pipelines to ensure system visibility.
Incident Response & Debugging
Monitor production systems, respond to alerts, troubleshoot issues across infrastructure and applications, and ensure timely resolution.
Scripting & Automation
Develop scripts and automation tools to streamline operations, deployments, and repetitive tasks.
Containerization & Orchestration
Manage containerized applications using Docker and support orchestration environments where applicable.
Documentation & Handover
Maintain clear documentation of incidents, changes, and processes, ensuring smooth handover between shifts.
Required Skills & Experience
- 2–5 years of experience in DevOps / SRE / Infrastructure Engineering
Cloud Infrastructure
- Hands-on experience with at least one cloud platform (Azure preferred)
- Strong understanding of networking, IAM, and cloud security
CI/CD
- Experience with GitHub Actions (workflow creation, secrets, runners)
- Familiarity with Jenkins / GitLab CI / Azure DevOps Pipelines
- Understanding of deployment strategies (blue-green, canary, rolling)
Infrastructure as Code
- Strong expertise in Terraform (modules, state management, workspaces)
- Experience with remote backends (Azure Blob, S3)
Monitoring & Observability
- Experience with tools like Azure Monitor, Grafana, Prometheus, or Datadog
- Knowledge of SLOs, SLIs, alerting systems, and logging practices
Debugging & Incident Handling
- Strong troubleshooting skills across infrastructure, networking, and application layers
- Experience handling production incidents
Scripting
- Proficiency in Python, Bash, or PowerShell
- Ability to automate workflows and operational tasks
Containers
- Strong understanding of Docker
- Kubernetes knowledge is a plus
What Will Make You Stand Out
- Experience working on production-scale systems
- Hands-on ownership of monitoring and alerting systems
- Experience handling incidents end-to-end
- Familiarity with Cloudflare, RabbitMQ, or similar tools
- Knowledge of PostgreSQL or database operations
- Experience in startup or high-growth environments
- Exposure to regulated or high-availability systems
The Mindset We Value
At Techdome, skills matter — but mindset matters more.
- Ownership Driven: You take responsibility for production systems
- Reliability First: You prioritize system uptime and stability
- Proactive Thinking: You prevent issues before they occur
- Calm Under Pressure: You handle incidents methodically
- Clear Communicator: You document and communicate effectively, especially in async environments
Life at Techdome
- Knowledge-sharing sessions
- Cross-team collaboration
- Innovation initiatives (Techdome Garage)
- Leadership connect programs
Why Join Techdome?
Accelerated Growth
- Work on real production systems
- Exposure to modern DevOps and cloud practices
Continuous Learning
- Access to training and certifications
- Hands-on experience with advanced tools
Rewards & Recognition
- Performance-based incentives
- Recognition programs
Employee Wellbeing
- Health insurance
- Wellness initiatives
- Supportive work culture
Engaging Work Culture
- Team events and celebrations
- Fun Fridays & engagement activities
- Annual events like TPL (Techdome Premier League)