Senior Manager, Site Reliability

1 day ago

Full-time

On-site

Bengaluru Karnataka India

Company Description

LinkedIn is the world’s largest professional network, built to create economic opportunity for every member of the global workforce. Our products help people make powerful connections, discover exciting opportunities, build necessary skills, and gain valuable insights every day. We’re also committed to providing transformational opportunities for our own employees by investing in their growth. We aspire to create a culture that’s built on trust, care, inclusion, and fun – where everyone can succeed.

Join us to transform the way the world works.

Job Description

This role will be based in Bangalore, India.

At LinkedIn, our approach to flexible work is centered on trust and optimized for culture, connection, clarity, and the evolving needs of our business. The work location of this role is hybrid, meaning it will be performed both from home and from a LinkedIn office on select days, as determined by the business needs of the team.

At LinkedIn, the Productivity Engineering Site Reliability Engineering (SRE) team plays a critical role in ensuring our enterprise business applications are reliable, scalable, secure, and highly automated.

We are seeking a Senior Manager, Site Reliability Engineering to lead a high-performing team of SREs, software engineers, enterprise engineers, and test automation engineers responsible for system health, observability, and operational excellence across both development and production environments.

In this role, you will partner closely with Development and Test Automation teams from early design through production, driving improvements in reliability, performance, and scalability across complex application ecosystems. You will also collaborate with cross-functional infrastructure teams to scale and modernize financial systems infrastructure.

You will lead strategic initiatives across application, database, and middleware platforms, including performance optimization and the transformation of systems from on-premises environments to modern multi-cloud architectures.

This is a key leadership opportunity for someone passionate about building high-performing teams, driving automation at scale, and delivering resilient, efficient platforms that power mission-critical business operations.

Responsibilities:

Build, lead, and scale a high-performing SRE organization, including hiring, mentoring, and organizational development

Act as a role model and coach with a strong bias for action, engineering craftsmanship, and operational excellence

Participate with senior leadership to define and drive the long-term technology vision, strategy, and roadmap aligned with business priorities

Establish and foster a culture of ownership, accountability, continuous improvement, and high operational standards

Collaborate closely with cross-functional partners across development, infrastructure, testing, and business teams to drive impactful roadmaps

Influence and align senior stakeholders across engineering, infrastructure, and business domains

Own availability, reliability, performance, and scalability of enterprise business applications and financial systems

Define and implement SRE best practices including SLOs, SLAs, error budgets, incident management, and operational frameworks

Lead end-to-end incident response, root cause analysis, and long-term remediation strategies to improve system resilience

Drive operational maturity through metrics, observability, automation, and continuous improvement initiatives

Oversee application, database, and middleware platform performance, reliability, and capacity planning

Lead modernization efforts including migration from legacy environments to modern infrastructure.

Evaluate and implement new technologies and architectural patterns to improve scalability, resilience, and efficiency

Define and drive observability strategy across monitoring, logging, tracing, and alerting systems

Champion an automation-first mindset to eliminate manual processes and improve operational efficiency

Drive development of internal tools and self-service platforms to enhance engineering productivity and reduce operational overhead

Improve deployment, release, and operational workflows through engineering-led automation and standardization

Own infrastructure cost management, capacity planning, and financial forecasting for Financial Systems

Optimize infrastructure and licensing investments (e.g., Oracle ecosystem) aligned with business and financial goals

Qualifications

Basic Qualifications:

BA/BS degree in Computer Science, Engineering, or a related technical field, or equivalent practical experience

12+ years of experience in Site Reliability Engineering, Production Engineering, or related disciplines

4+ years of experience leading and scaling high-performing engineering teams

Experience in Oracle ERP (EBS/Fusion) and Oracle Database technologies

Hands-on experience with Oracle 19c database administration, including high availability and disaster recovery (Oracle RAC, Grid Infrastructure, Data Guard)

Experience operating in SOX-compliant environments with focus on controls, audit, and governance

Preferred Qualifications:

4+ years of hands-on experience troubleshooting complex issues across Unix/Linux, networking, and Windows environments

Proven experience working within an SRE organization, managing and operating large-scale production systems and applications

Strong understanding and practical application of SRE principles, including designing systems aligned with SLA/SLO/SLI objectives and building resilient, highly available platforms

2+ years of programming experience in Python, Shell scripting, or similar languages for automation and tooling

Experience with configuration management and automation tools such as Ansible and Chef

Experience with infrastructure-as-code and cloud orchestration tools such as Terraform

Familiarity with observability and telemetry platforms such as Oracle Enterprise Manager, Azure Log Analytics, or similar

Experience with containerization and orchestration technologies such as Docker and Kubernetes

Experience with Microsoft SQL Server and IIS is a plus

Strong understanding of distributed systems fundamentals, including data structures, relational and non-relational databases, networking, Linux internals, filesystems, storage systems and web architectures

Experience working with a broad range of open-source technologies and cloud services

Solid understanding of Agile development methodologies and modern software development practices

Strong interpersonal and communication skills, with the ability to collaborate effectively across diverse, cross-functional teams

Suggested Skills:

Technical Leadership

People Management

Stakeholder Management

Additional Information

You will Benefit from our Culture

We strongly believe in the well-being of our employees and their families. That is why we offer generous health and wellness programs and time away for employees of all levels

India Disability Policy

LinkedIn is an equal employment opportunity employer offering opportunities to all job seekers, including individuals with disabilities. For more information on our equal opportunity policy, please visit https://legal.linkedin.com/content/dam/legal/Policy_India_EqualOppPWD_9-12-2023.pdf

Global Data Privacy Notice for Job Candidates

Please follow this link to access the document that provides transparency around the way in which LinkedIn handles personal data of employees and job applicants: https://legal.linkedin.com/candidate-portal.

Apply now

Senior Manager, Site Reliability

More jobs

Program Manager, Vendor Operations

LinkedIn

Product Operations Manager

LinkedIn