DA
Databricks
Sr. IT Site Reliability Software Engineer
GAQ127R40 Team: IT Infrastructure and Operations About the Role At Databricks Information Technology, we are a product-led organization transforming how we work—from the ease of using our IT services to the applications we develop to scale seamlessly during rapid growth. As a Site Reliability Engineer (SRE) , you will bridge the gap between software engineering and systems architecture. You will be a core contributor to the IT Infrastructure team, owning the evolution of core infrastructure and observability platforms. This role requires a strong software engineering mindset and deep technical breadth to deliver high-quality, scalable solutions for "immature" system problems. Your focus will be on building resilient, automated infrastructure that empowers development teams and ensures our cloud environment is cost-optimized, secure, and highly available. The Impact You Will Have Architect and Automate: Design and deploy production-grade infrastructure on cloud platforms (AWS/Azure) using Infrastructure as Code (IaC) tools like Terraform or Pulumi. Reliability and Performance Engineering: Optimize system performance, architecture, and scaling to ensure maximum uptime and minimal latency for critical IT services. CI/CD Excellence: Architect robust deployment pipelines (e.g., GitHub Actions), managing both hosted and self-hosted runners for specialized build requirements. Observable by Default: Create underlying infrastructure to ensure new internal applications are secure