Job Description
We are seeking an experienced Service Delivery Manager (SDM) with deep expertise in IT operations, cloud infrastructure management, and a forward-thinking vision for AI-driven modernisation. The SDM will serve as the single point of accountability for end-to-end service delivery governance, operational excellence, and strategic leadership of a cloud-hosted infrastructure support environment, ensuring high availability, reliability, and performance across all managed services. The ideal candidate brings 12 years of progressive experience in IT service delivery and operations, managing a production support team for a medium-sized enterprise, a hands-on understanding of cloud platforms (preferably Azure or AWS), a proven ability to manage 24x7 infrastructure support teams, and a strategic mindset to drive AI-led modernisation of support operations. This person must be proactive, excellent at communication, data-driven, commercially aware, and capable of managing senior stakeholder relationships with confidence.
Job Description / Duties and Responsibilities
Service Delivery Governance
Own end-to-end service delivery for cloud infrastructure and managed operations, acting as the single point of accountability for service performance.
Define, negotiate, monitor, and enforce SLAs, SLOs, and KPIs across all service towers.
Establish and govern ITIL-based processes, including Incident, Problem, Change, and Release Management.
Conduct regular service reviews, Quarterly Business Reviews (QBRs), and governance cadences with client and internal stakeholders.
Manage escalations proactively, driving resolution and stakeholder communication in real time.
Track and improve CSAT, NPS, and operational satisfaction metrics across all delivery functions.
Maintain and report on operational dashboards covering SLA compliance, incident volumes, MTTR, MTTD, and service health. Cloud Infrastructure Management
Oversee 24x7 cloud operations across Azure (primary), AWS, or GCP environments, ensuring high availability and reliability.
Govern infrastructure health, including uptime, performance, capacity planning, disaster recovery readiness, and RTO/RPO compliance.
Manage vendors, MSPs, and technology partners delivering infrastructure support services, holding them accountable to contractual SLAs.
Ensure security compliance, vulnerability governance, patch management, and adherence to Information Security Management policies.
Drive cloud cost optimisation and FinOps practices by monitoring usage, identifying waste, and enforcing right-sizing disciplines.
Collaborate with engineering and architecture teams on cloud migrations, platform upgrades, and infrastructure modernisation initiatives. IT Operations Team Leadership
Lead and mentor a cross-functional infrastructure support team covering L1, L2, and L3 support tiers.
Manage shift-based 24x7 operations with well-defined escalation frameworks, on-call schedules, and coverage plans.
Drive operational maturity through creation and maintenance of runbooks, playbooks, and standard operating procedures (SOPs).
Track MTTR and MTTD trends initiate and own Problem Management reviews for recurring incidents.
Conduct Root Cause Analysis (RCA) for critical and recurring incidents, implementing preventive actions to reduce future outages.
Champion a culture of accountability, continuous improvement, and proactive operations across the team. Stakeholder Commercial Management
Manage client relationships at senior and executive levels, ensuring high confidence and transparency in service delivery.
Prepare and present service performance reports, operational dashboards, and executive briefings.
Participate in contract reviews, SOW negotiations, renewals, and change order management.
Identify and drive account growth through expanded service offerings and proactive value demonstration.
Coordinate with procurement, legal, and finance for commercial governance and vendor management activities. AI-Led Modernisation Innovation
Define and drive an AI-led roadmap for infrastructure operations modernisation aligned with business goals.
Champion adoption of AIOps platforms such as Dynatrace, ServiceNow AIOps to enable predictive incident detection and automated remediation.
Explore and implement LLM-based assistants and intelligent bots for L1 support automation and knowledge management.
Promote infrastructure-as-code (IaC) and GitOps practices to improve provisioning speed, consistency, and compliance.
Drive shift-left strategies, moving the team from reactive support to proactive and predictive operational models.
Partner with engineering and product teams to embed observability, self-healing, and reliability capabilities into cloud environments.
Build a continuous improvement culture backed by operational data, AI-driven insights, and regular retrospectives.
Job Specification / Skills and Competencies
Must Have
12 years of progressive experience in IT service delivery, operations management, or managed services.
Proven experience managing cloud-hosted infrastructure environments - Azure (strongly preferred), AWS, or GCP.
Strong working knowledge of ITIL v3/v4 - Incident, Problem, Change, and Service Level Management.
Experience managing 24x7 infrastructure or NOC support operations with L1/L2/L3 team structures.
Demonstrated ability to manage SLAs, run incident bridges, conduct war rooms, and own escalations end-to-end.
Excellent stakeholder management and executive communication skills - able to present to C suite audiences.
Hands-on experience with ITSM platforms such as ServiceNow, Remedy, or equivalent.
Strong analytical mindset - ability to build dashboards, interpret operational data, and drive improvement actions.
Good to Have
Azure Administrator, AWS Solutions Architect, or GCP equivalent cloud certification.
ITIL 4 Foundation or Managing Professional certification.
Exposure to SRE principles - error budgets, SLOs, toil management, and reliability engineering practices.
Experience evaluating or deploying AIOps or observability platforms.
Familiarity with FinOps frameworks and cloud cost management tools.
Knowledge of DevOps/DevSecOps practices, CI/CD pipelines, and IaC tooling.
PMP, PRINCE2, or equivalent project management certification.
Certifications Desired
ITIL 4 Foundation or Managing Professional
PMP / PRINCE2 / SAFe Agilist
Microsoft Azure Administrator (AZ-104) or Azure Solutions Architect (AZ-305)
AWS Certified SysOps Administrator or Solutions Architect Work Model
Primary work location Thiruvananthapuram / Kochi - Hybrid model applicable.
Work standard US/UK business hours, including availability for after-hours on-call escalations.
Expected to participate in client governance calls across global time zones (US/UK shifts as required).
Role demands full ownership of service delivery outcomes across the managed infrastructure portfolio.
Adherence to Experion s Information Security Management policies and procedures is mandatory.