Skip to main content
<- Back to Blog

IT On-Call Policy Template [Free] — Rotation Schedules, Escalation & Compensation

Vik Chadha
Vik Chadha · Founder & CEO ·
IT On-Call Policy Template [Free] — Rotation Schedules, Escalation & Compensation

On-call burnout is the #1 reason IT operations staff quit. A poorly designed on-call program leads to exhausted engineers, missed incidents, and attrition that costs $50,000-$150,000 per replacement hire. A well-structured IT on-call policy fixes this by setting clear expectations for rotation schedules, response times, escalation paths, and compensation. This guide provides a complete, ready-to-use on-call policy template. For additional IT management resources, visit our IT Policy Templates guide and IT Management Policies.

Quick Start: Download our free IT On-Call Policy Template — covers rotation design, escalation procedures, compensation guidelines, and response time SLAs. Customize for your team in under an hour.

What Is an IT On-Call Policy?

An IT on-call policy is a formal document that defines the rules, expectations, and procedures for staff who are designated to respond to IT incidents outside of normal business hours. It covers who is on call, when, how they're contacted, what they're expected to do, and how they're compensated.

Why You Need a Formal On-Call Policy

Without a PolicyWith a Policy
Same people always get calledFair rotation ensures equitable distribution
No clear response time expectationsDefined SLAs by severity level
Engineers don't know when they're "off"Clear on/off boundaries reduce burnout
Compensation is inconsistent or nonexistentFair pay for after-hours work
Escalation is chaotic ("just call whoever")Structured escalation paths with backup
Burnout and attritionSustainable, predictable program

IT On-Call Policy Template

1. Policy Overview

IT ON-CALL POLICY
Version: 1.0
Effective Date: [Date]
Policy Owner: [IT Director / VP of Engineering]
Approved By: [CTO / VP of IT]

PURPOSE:
This policy establishes the framework for IT on-call coverage,
ensuring critical systems are monitored and incidents are resolved
outside of normal business hours. It defines rotation schedules,
response expectations, escalation procedures, and compensation.

SCOPE:
This policy applies to all IT staff who participate in on-call
rotations, including:
- Infrastructure and operations engineers
- Site reliability engineers (SRE)
- Database administrators
- Network engineers
- Security operations staff
- Application support engineers

DEFINITIONS:
- On-call: Designated period where an engineer must be reachable
  and able to respond to incidents within defined SLA
- Primary on-call: First responder for all incoming alerts
- Secondary on-call: Backup who is engaged if primary doesn't respond
- Escalation: Process of involving additional personnel when an
  incident exceeds the responder's ability or authority

2. On-Call Rotation Schedules

Rotation Models

ModelHow It WorksBest ForDrawbacks
Weekly rotationOne person covers 7 days, then rotatesSmall teams (3-5 people)Full week can be exhausting
Follow-the-sunShifts align with time zones, no overnightDistributed teams across 3+ time zonesRequires global headcount
Split weekWeekdays (Mon-Fri) and weekend (Sat-Sun) are separateTeams where weekends are significantly differentMore handoffs
Day/night splitDay shift (8am-8pm) and night shift (8pm-8am)High-volume environmentsRequires larger team
Bi-weekly rotationTwo-week rotation with primary and secondaryMedium teams (5-10 people)Long rotation period

Sample Weekly Rotation Schedule

WeekPrimary On-CallSecondary On-CallShift
Feb 10-16Engineer AEngineer BMon 9am - Mon 9am
Feb 17-23Engineer BEngineer CMon 9am - Mon 9am
Feb 24-Mar 2Engineer CEngineer DMon 9am - Mon 9am
Mar 3-9Engineer DEngineer EMon 9am - Mon 9am
Mar 10-16Engineer EEngineer AMon 9am - Mon 9am

Rotation Rules

  • Minimum team size: 4 people for weekly rotation (ensures no more than 1 week in 4 on call)
  • Maximum on-call frequency: No more than 1 week in 3 (33% of time)
  • Holiday coverage: Rotated separately and equitably tracked over the year
  • Swap policy: Engineers may swap shifts with manager approval and 48-hour notice
  • New hire exemption: New team members shadow on-call for 2 rotations before going primary
  • Consecutive limit: No engineer should be primary on-call for more than 7 consecutive days

3. Response Time SLAs

SeverityDefinitionAcknowledgmentResponseResolution Target
P1 — CriticalComplete service outage, data breach, revenue-impacting5 minutes15 minutes to begin working1 hour (mitigate), 4 hours (resolve)
P2 — HighMajor feature degraded, significant user impact15 minutes30 minutes to begin working4 hours
P3 — MediumMinor feature impacted, workaround available30 minutes1 hour to begin workingNext business day
P4 — LowInformational alert, no user impact1 hourNext business dayScheduled maintenance window

Response expectations during on-call:

  • Must be reachable by phone and paging system at all times during shift
  • Must be able to access a laptop and VPN within 15 minutes of page
  • Must not be more than 30 minutes from a reliable internet connection
  • Alcohol consumption must not impair ability to respond and troubleshoot
  • If temporarily unreachable (medical appointment, flight), secondary must be notified

4. Escalation Procedures

Escalation Matrix

TriggerActionWho to Escalate ToTimeline
Primary doesn't acknowledge within SLAAuto-page secondary on-callSecondary engineerAutomatic after SLA breach
Secondary doesn't acknowledgePage on-call managerEngineering manager10 minutes after secondary SLA
P1 incident lasting over 30 minutesNotify managementEngineering manager + IT Director30 minutes into incident
P1 incident lasting over 1 hourExecutive notificationVP of Engineering / CTO1 hour into incident
Incident requires cross-team supportPage relevant team's on-callOther team's primary on-callAs needed
Incident has customer impactNotify customer successCS on-call + CommunicationsImmediately upon confirmed impact

Escalation Procedure

ESCALATION FLOW

1. Alert fires → Primary on-call paged
   ├── Acknowledged within SLA → Primary investigates
   │   ├── Resolved → Document and close
   │   └── Cannot resolve alone → Escalate to secondary/specialist
   └── NOT acknowledged within SLA → Secondary auto-paged
       ├── Acknowledged → Secondary investigates
       └── NOT acknowledged → Manager paged (phone call)

2. For P1 incidents:
   - 15 min: Primary begins investigation
   - 30 min: If unresolved, notify engineering manager
   - 60 min: If unresolved, notify IT Director/CTO
   - 90 min: If unresolved, incident commander assigned, war room opened

5. On-Call Compensation

On-call compensation varies by company and region. Here are the most common models:

Compensation Models

ModelHow It WorksTypical RangeBest For
Flat on-call stipendFixed amount per on-call shift regardless of pages$200-$500/weekPredictable cost, low-volume paging
Hourly rate for active responseBase stipend + hourly pay when actively working incidents$50-$150/hour (active) + $100-$200/week (standby)High-volume environments
Comp time (time off in lieu)No extra pay, but earn time off for on-call work1:1 or 1.5:1 ratioStartups, budget-constrained
Percentage of salaryOn-call pay as % of base salary5-15% of base salary while on callSimplicity, salary-proportional
HybridFlat stipend + additional pay per page/incident$200/week + $50/incidentBalanced incentive

Sample Compensation Schedule

On-Call PeriodStandby PayActive Incident PayHoliday Multiplier
Weeknight (6pm-8am)$50/night$75/hourN/A
Weekend day (8am-6pm)$75/day$75/hourN/A
Weekend night (6pm-8am)$75/night$100/hourN/A
Holiday (full day)$150/day$100/hour1.5x standby

Legal considerations:

  • Non-exempt (hourly) employees must be paid for all time spent responding to incidents per FLSA
  • Exempt employees may not be legally required to receive additional pay, but best practice is to compensate them
  • Some states have specific on-call pay requirements — check your state labor laws
  • On-call time where the employee cannot use the time freely may count as hours worked under FLSA

6. On-Call Tooling Requirements

Tool CategoryPurposeExamples
Paging/alertingRoute alerts to on-call engineerPagerDuty, Opsgenie, VictorOps
MonitoringDetect incidents automaticallyDatadog, New Relic, Prometheus/Grafana
CommunicationCoordinate during incidentsSlack (incident channel), Microsoft Teams, phone bridge
RunbooksStep-by-step resolution guidesConfluence, Notion, internal wiki
Incident managementTrack and document incidentsJira, ServiceNow, Rootly, Firehydrant
VPN/Remote accessAccess production systems remotelyCompany VPN, SSH bastion host

Tooling requirements for on-call engineers:

  • Company-provided phone or phone stipend ($50-$100/month)
  • Company laptop with VPN configured and tested
  • Home internet backup plan (mobile hotspot as fallback)
  • Access to all production monitoring dashboards
  • Runbooks accessible from mobile device

7. On-Call Health and Sustainability

Burnout Prevention

MetricHealthy TargetAction if Exceeded
Pages per on-call shiftFewer than 10/weekFix noisy alerts, improve automation
After-hours incidents requiring responseFewer than 3/weekImprove system reliability
Average incident resolution timeUnder 30 minutesBetter runbooks, automation
On-call frequency per engineerNo more than 1 week in 4Hire additional staff
Engineer satisfaction (survey)Above 3.5/5Review rotation, compensation, workload

After On-Call Recovery

  • Engineers receive a recovery half-day off after a week of on-call (or full day if more than 5 incidents)
  • If called during sleeping hours (midnight-6am), the engineer may start the next workday late
  • Managers should check in with engineers after particularly heavy on-call weeks
  • Track cumulative on-call burden per engineer per quarter — adjust if inequitable

8. Runbook Requirements

Every service covered by on-call must have a runbook. No engineer should be paged for a service they don't have documentation for.

Minimum runbook contents:

SERVICE RUNBOOK: [Service Name]
Last Updated: [Date]
Owner: [Team/Engineer]

OVERVIEW:
- What this service does
- Who it serves (internal/external)
- Business impact if down

ARCHITECTURE:
- Infrastructure diagram (or link)
- Dependencies (upstream and downstream)
- Data stores

COMMON ALERTS AND RESOLUTION:
Alert: [Alert name]
  Meaning: [What this alert indicates]
  Impact: [User/business impact]
  Steps:
    1. [First thing to check]
    2. [Second thing to check]
    3. [How to resolve]
  Escalate if: [When to escalate and to whom]

CONTACT INFORMATION:
- Service owner: [Name, phone, Slack]
- Database team: [Contact]
- Vendor support: [Contact, account #, SLA]

ROLLBACK PROCEDURES:
- How to roll back the last deployment
- How to failover to backup

Implementing Your On-Call Program

Phase 1: Design (Week 1)

  • Determine which services require on-call coverage
  • Identify eligible team members (minimum 4 per rotation)
  • Choose rotation model based on team size and distribution
  • Define severity levels and response time SLAs
  • Set compensation structure (get HR and finance approval)

Phase 2: Tooling and Documentation (Week 2)

  • Configure paging/alerting tool with on-call schedules
  • Create or update runbooks for all covered services
  • Set up escalation policies in alerting tool
  • Configure monitoring thresholds and alert routing
  • Test paging flow end-to-end (including escalation)

Phase 3: Launch (Week 3)

  • Distribute on-call policy to all participating engineers
  • Conduct on-call training session (runbook walkthrough, tool training)
  • Shadow rotation: new on-call engineers paired with experienced ones
  • Collect signed acknowledgments
  • Begin first rotation

Phase 4: Optimize (Ongoing)

  • Review on-call metrics monthly (page volume, MTTA, MTTR)
  • Survey on-call engineers quarterly on satisfaction and workload
  • Reduce alert noise (eliminate false positives and duplicate alerts)
  • Update runbooks after every incident that revealed a documentation gap
  • Adjust rotation and compensation annually based on feedback

Frequently Asked Questions

How many people do I need for an on-call rotation?

Minimum 4 for a weekly rotation (1 in 4 frequency). Ideally 5-6 so engineers aren't on call more than once a month. If you only have 2-3 people, consider a manager-as-backup model or hiring specifically for operations coverage.

Should on-call be mandatory?

For operations and SRE roles, on-call is typically a job requirement and should be stated in the job description. For other engineering roles, participation may be optional or incentivized with additional compensation. Never spring on-call requirements on existing employees without their agreement.

How do I reduce on-call burnout?

Three things matter most: (1) reduce page volume through better monitoring and automation, (2) compensate fairly, and (3) limit frequency to no more than 1 week in 4. Engineers burn out from noisy alerts and feeling uncompensated far more than from occasional real incidents.

What if an on-call engineer doesn't respond?

The escalation policy should automatically page the secondary after the SLA expires. If both primary and secondary fail to respond, the manager is paged. Repeated failures to respond should be addressed through performance management, not by removing the engineer from rotation (which overloads others).

Do I need to pay exempt employees for on-call time?

Under FLSA, exempt employees are not legally required to receive additional on-call pay. However, best practice — and the standard in the tech industry — is to provide on-call compensation to all participants regardless of exempt status. It's a retention and fairness issue, not just a legal one.

Explore More IT Policies Resources

Comprehensive IT policy templates, governance frameworks, and compliance documentation

Need a Template for This?

Browse 200+ professional templates for IT governance, financial planning, and HR operations. 74 are completely free.