Execution Strategy
Execute a 'Stabilize and Simplify' Strategy to Address Systemic Rework and SLA Failures
The incident management process is critically unstable, evidenced by a 60.2% SLA breach rate and rework affecting over 57% of incidents. The workflow is plagued by excessive complexity, with 49 process variants for every 150 tickets and ambiguous state transitions causing significant delays. The execution priority must be to radically simplify the incident lifecycle, enforce data governance at closure, and only then introduce targeted automation. This foundational approach will stop value leakage, stabilize performance, and build the necessary platform for future, more advanced improvements.
Critical Confidence: High
The 60.2% SLA breach rate represents a consistent failure to meet service commitments. The current level of process chaos makes performance unpredictable and prevents effective automation or AI deployment. Deferring action ensures continued operational inefficiency and erosion of user trust.
1Priority Actions
Priority
1
Simplify and Standardize the Incident Lifecycle
Process Impact: Very High Effort: Medium 30-60 days
Why now
The root cause of the high rework and SLA breaches is an overly complex and ambiguous state model. Rework loops like 'Assigned -> Active' (35.6% of incidents) are driven by redundant states. Simplifying the process is the prerequisite for any other sustainable improvement.
Business outcome
Drastically reduce process variation and eliminate the most common rework loops, leading to shorter resolution times and improved SLA attainment.
Scope
Consolidate the 'Active', 'Assigned', and 'Work in Progress' states into a single 'In Progress' state. Deprecate the confusing 'Pending User' state and standardize on 'Awaiting User Info' for all user-dependency holds. Publish and enforce strict entry/exit criteria for each state.
Owner
Head of Service Management
Dependencies
Cross-functional team agreement on new state model
ServiceNow platform team for configuration changes
Risks
Resistance to changing established ways of working
Inadequate training on the new, simplified process
Success measures
Reduction in overall process rework from >57% to <30%
Reduction in the number of process variants by at least 40%
Decrease in the frequency of the 'Assigned -> Active' transition by 90%
Evidence
TOP TRANSITIONS shows the 'Assigned -> Active' rework loop occurs in 35.60% of all incidents.
TOP VARIANTS data reveals that the majority of common paths are rework variants, with the ideal path only representing 14.25% of volume.
METRICS HISTORY shows a consistently high rework rate, averaging around 58%.
Priority
2
Enforce Data Quality at Incident Closure
Data Impact: High Effort: Low 60-90 days
Why now
Meaningful analysis and future automation are impossible with current data quality. Over 21% of incidents are closed with ambiguous codes like 'Referred', 'Cancelled', or 'Not Solved', and 6.5% have no code at all ('None'). This prevents learning and effective problem management.
Business outcome
Create a reliable and accurate dataset of incident outcomes, enabling effective root cause analysis and providing clean data for future machine learning applications.
Scope
Make the 'close_code' field mandatory upon resolution. Rationalize the list of available close codes to be outcome-focused. Provide clear definitions for each code and deprecate ambiguous options.
Owner
Process Governance Lead
Dependencies
Completion of Priority Action 1 (Simplified Lifecycle)
Risks
Teams may select the easiest close code rather than the most accurate if not properly trained or audited.
Success measures
Reduce the percentage of incidents with a 'close_code' of 'None' to zero.
Reduce the usage of 'Referred', 'Cancelled', and 'Not Solved' close codes by 50% through better process alternatives.
Evidence
FIELD USAGE for 'close_code' shows that 'Not Solved (Request Denied)', 'Cancelled', 'Referred', and 'None' account for over 28% of all incidents.
Priority
3
Automate Triage and Assignment for High-Volume Categories
Automation Impact: Medium Effort: Medium 90+ days
Why now
Significant delays occur during the initial manual handling phases ('New -> Active' takes 3.27 hours; 'Active -> Assigned' takes 4.6 hours). Automating this first step provides a quick win on response times for common issues, but only after the process is stable.
Business outcome
Accelerate incident response time, improve assignment consistency, and reduce the likelihood of response SLA breaches.
Scope
Identify the top 3-5 incident categories from service portal and email channels. Implement automated assignment rules based on keywords, reported service, or user location to route these incidents directly to the correct team, bypassing manual triage.
Owner
ServiceNow Platform Owner
Dependencies
Completion of Priority Action 1 (Simplified Lifecycle)
Completion of Priority Action 2 (Data Quality)
Risks
Poorly configured rules could lead to widespread misassignment, increasing rework.
Success measures
Reduce average time in 'New' state by 50% for targeted categories.
Achieve 95% first-time assignment accuracy for automated assignments.
Evidence
TOP TRANSITIONS shows average durations of 3.27 hours for 'New -> Active' and 4.60 hours for 'Active -> Assigned', indicating significant early-stage delays.
2Phased Plan
1
Foundation: Stabilize & Simplify (Days 0-60)
Eradicate process ambiguity and establish a single, standard workflow to serve as a stable foundation for all future improvements.
Why this phase
All other improvements depend on a predictable and simple core process. Tackling the fundamental state model issues is the highest-leverage first step.
Included priorities
Priority 1
Entry criteria
Leadership sponsorship for process redesign.
Commitment of ServiceNow admin resources.
Exit criteria
New simplified state model is deployed in production.
All support teams trained on the new process.
Initial data shows a reduction in rework variants.
Expected outcomes
Reduced process complexity and fewer handoffs.Clearer understanding for agents of the incident lifecycle.
2
Governance: Improve Data Integrity (Days 60-90)
Ensure all incidents flowing through the new, stable process are closed with accurate, meaningful, and mandatory outcome data.
Why this phase
With a stable process, the next priority is to ensure the data it produces is trustworthy. Good data is essential for measurement, analysis, and future AI.
Included priorities
Priority 2
Entry criteria
Phase 1 exit criteria met.
New state model has been active for at least 30 days.
Exit criteria
Mandatory and rationalized 'close_code' field is live.
Data reports show a significant reduction in ambiguous closure codes.
Expected outcomes
Ability to generate accurate reports on incident resolution outcomes.A clean dataset for root cause analysis.
3
Acceleration: Introduce Targeted Automation (Days 90+)
Begin automating high-value, low-risk components of the process to accelerate response and reduce manual toil.
Why this phase
With a stable process and reliable data, we can now safely introduce automation to gain efficiency without the risk of automating a flawed workflow.
Included priorities
Priority 3
Entry criteria
Phase 2 exit criteria met.
Exit criteria
Automated triage rules are live for 3-5 incident categories.
Monitoring confirms a reduction in time-to-assignment for targeted incidents.
Expected outcomes
Faster response times for common incident types.Reduced manual effort for the service desk.Improved response SLA performance.
3Sequencing Principles
Process First, Technology Second
The evidence points to a broken process, not a technology gap. We must stabilize and simplify the workflow before applying automation; automating the current process would only accelerate the chaos and rework.
Data Quality Precedes AI
Advanced capabilities like predictive intelligence or resolution recommendations require high-quality, structured data. We will prioritize fixing data capture (e.g., close codes) before attempting to build any AI models on the unreliable existing dataset.
Target Value, Not Volume
Our initial automation efforts will be narrow and deep, focusing on the triage and assignment of a few high-impact incident types to demonstrate value quickly and build momentum, which aligns with the organization's low appetite for automation.
4Do Not Do Yet
Implement a full-scale AI for incident resolution
The process is too varied and the resolution data ('close_code') is too unreliable to train a trustworthy AI model. This must wait until after the process is standardized and data quality is high.
Overhaul the self-service portal user interface
Improving the front-end experience will only increase user frustration if the back-end fulfillment process remains broken. We must fix the core process first to ensure faster resolution, regardless of intake channel.
Introduce more granular incident statuses
The core problem is too many ambiguous states causing rework. The immediate priority is radical simplification and consolidation, not adding further complexity to the workflow.