Process Health — Chaotic-1

Executive Assessment

Chaotic-1

Jun 28, 2026, 11:55 AM

Process Health

Severe Rework, Process Chaos, and High SLA Breaches Cripple Incident Resolution

The incident management process is in a critical state, defined by extreme instability and inefficiency. A 60.3% rework rate means most incidents require redundant effort. This is compounded by a chaotic workflow with 222 different resolution paths, indicating a complete lack of standardization. Consequently, the process is failing to meet its service commitments, with a 63% SLA breach rate. While individual agents work efficiently on tasks (80% flow efficiency), the broken process structure negates these efforts, leading to unpredictable outcomes and poor service quality.

The score reflects a critical combination of severe rework (60.3%), extreme process fragmentation (222 variants), and a very high SLA breach rate (63%). While flow efficiency is high, the underlying process is unstable, unpredictable, and consistently fails to meet service commitments. The data quality is good, which provides high confidence in these negative findings.

Headline Signals

Rework Rate

ProcessCritical

60.3%

More than half of all incidents require repeated effort, indicating underlying issues with initial diagnosis, information gathering, or resolution steps, significantly increasing effort and delaying closure.

SLA Breach Rate

sla_signalsCritical

63%

A majority of incidents with SLAs are failing to meet service commitments, exposing the business to risk and indicating the current process cannot deliver on its time-based promises.

Process Variants

ProcessCritical

222

An extremely high number of process paths for a standard workflow like Incident Management shows a lack of a standard process, making it difficult to manage, improve, or automate.

Top Rework Loop: Assigned -> Active

TransitionsCritical

35.6% of items

Over one-third of incidents bounce back from an 'Assigned' state to 'Active', a clear signal of widespread incorrect routing and triage failure at the start of the process.

Flow Efficiency

Time IntelGood

80.2%

When work is actively being handled, it progresses efficiently. This is a strength to build on, but it is completely undermined by the high rework and fragmentation that disrupt the flow.

Assignment Group Fragmentation

AttributesWarning

46 Groups

Work is spread across a large number of assignment groups, with no single group handling more than 2.7% of the volume. This can lead to inconsistent handling and delays in finding the right resolver.

Time Profile

The process has a very high flow efficiency of 80.2%, meaning that most of the ~24-hour cycle time is spent in active work (18.3 hours). Wait and queue times are relatively low. However, this efficiency is misleading, as the high rework rate means this 'touch time' is often repeated and wasteful, inflating the total effort required for resolution.

Average Cycle Time

20.8 hours

Average Touch Time

18.3 hours

Average Wait Time

4.5 hours

Flow Efficiency

80.2%

Major DiscoveryRules

Find

Process is Defined by Rework, Not a Standard Flow

💡 rework

📊 Evidence

60.3% of all incidents involve rework. The most common rework loops are 'Assigned -> Active' (affecting 35.6% of items) and 'Work in Progress -> Pending User' (affecting 43% of items).

🔎 Insight

Rework is the standard mode of operation. Incidents are either mis-assigned and bounce back, or they are paused frequently to gather more information, suggesting poor initial data capture and triage.

💼 Business Impact

Drives up resolution time, increases manual effort, reduces predictability, and frustrates both end-users and support teams.

Find

Extreme Fragmentation Prevents Effective Management

💡 standardisation

📊 Evidence

There are 222 unique paths (variants) to resolve an incident. The most common path is followed by only 14.25% of incidents.

🔎 Insight

There is no standard operating procedure. Teams handle incidents ad-hoc, leading to unpredictable outcomes and making the process impossible to train on, manage, or automate.

💼 Business Impact

Causes inconsistent service quality, inflates operational costs, and prevents effective root cause analysis or targeted improvement.

Find

Systemic SLA Failure Indicates a Broken Service Promise

💡 predictability

📊 Evidence

Based on a sample of 100 incidents, 63% breach their SLAs. This high failure rate is observed across all priority levels.

🔎 Insight

The process is fundamentally unable to meet its service level targets. The combination of rework, fragmentation, and routing delays makes timely resolution unattainable.

💼 Business Impact

Erodes trust with business stakeholders, exposes the organization to service credit risks, and undermines the purpose of prioritization.

Find

Ineffective Triage and Assignment Creates Early Delays

💡 structural_design

📊 Evidence

Problematic variants show repeated looping between 'Active' and 'Assigned' states. The 'Assigned -> Active' rework transition alone affects over a third of all incidents.

🔎 Insight

The initial assignment is frequently incorrect or requires further clarification, causing incidents to bounce back before work can begin. This points to a critical failure at the front of the process.

💼 Business Impact

Delays the start of meaningful resolution work, inflates touch time, and is a primary driver of SLA breaches.

Find

Fragmented Group Ownership Slows Resolution

💡 workload_segmentation

📊 Evidence

Incident workload is distributed across 46 assignment groups, and the top 10 groups combined handle only 25% of the volume.

🔎 Insight

There is no clear specialization or ownership for incident types. This suggests work is frequently misrouted or that teams lack specific expertise, contributing to reassignment churn and delays.

💼 Business Impact

Slows down time-to-resolution as incidents bounce between teams to find the correct owner. It also complicates performance management and skill development.

Path Insights

The top 12 variants cover only 68.2% of incidents. The remaining 31.8% are spread across 210 other variants, demonstrating extreme process fragmentation and a lack of predictable execution.

New > Active > Assigned > Work in Progress > Closed

Dominant Path

This is the most common 'happy path,' representing an ideal, linear flow. However, it only accounts for 14.25% of incidents, highlighting the severe lack of standardization in the process.

Covers 14.25% of incidentsNo rework loopsShould be the target model for standardization efforts

New > Active > Assigned > Work in Progress > Pending User > Closed

Dominant Path

This common path involves pausing work to await user information. Its high frequency (10.6% of incidents) suggests an opportunity to improve initial information gathering to avoid this delay.

Covers 10.6% of incidentsInvolves a wait state for user infoA key target for automation (e.g., user follow-up)

New > Active > Assigned > Active > Assigned > Closed

Problem Path

This path, affecting 7.8% of incidents, demonstrates significant assignment churn. The incident bounces between 'Active' and 'Assigned' states twice, a clear indicator of failed routing and triage.

Affects 7.8% of incidentsContains two wasteful rework loopsHighlights critical issues in triage and assignmentDirectly contributes to SLA breaches

New > Active > Assigned > Active > Closed

Problem Path

Affecting 7.6% of incidents, this variant involves a single but impactful bounce between 'Active' and 'Assigned', reinforcing the pattern of systemic assignment failure.

Affects 7.6% of incidentsInvolves one common rework loopShows instability at the assignment stage

Leadership Priorities

🔐

Standardize the Core Incident Process

Foundational

The current process is unmanageable, with 222 variants and a 60% rework rate. This chaos makes improvement impossible and drives up operational costs.

Expected Benefit

Drastically reduce process variations, lower the rework rate, and create a predictable, measurable baseline for performance management and automation.

Likely Owner

Head of Service Management / Incident Process Owner

AI: Use process mining insights to define the optimal 'happy path' and diagnose the root causes of major deviations for elimination.Automation: Implement ServiceNow Playbooks or Flow Designer to guide agents through the newly standardized process steps, ensuring consistency.Risk if delayed: Continued service failures, high operational costs, and an inability to scale support operations.

📋

Fix the Front Door: Overhaul Triage and Routing

Strategic

High-volume rework between 'Active' and 'Assigned' states is a primary driver of the 63% SLA breach rate. Incorrect initial assignment is wasting significant time.

Expected Benefit

Improve first-assignment accuracy, reduce resolution time, and significantly lower the SLA breach rate.

Likely Owner

Service Desk Leadership / Platform Owner

AI: Implement AI-powered routing to predict the correct assignment group based on incident data (e.g., summary, category, CI).Automation: Automate the assignment of well-structured incidents from channels like the Service Portal or Virtual Agent directly to specialist teams.Risk if delayed: Persistent SLA failures and wasted effort from skilled resolver teams.

✅

Automate User Follow-Up to Reduce Wait Time

Quick Win

The transition to 'Pending User' is part of a rework loop affecting 43% of all incidents, introducing significant delays while waiting for information.

Expected Benefit

Reduce the manual effort of chasing users for information and shorten the time incidents spend in a waiting state.

Likely Owner

Service Management / Automation CoE

AI: Use AI to analyze incident text to determine if required information is missing before it is even assigned to a human.Automation: Implement automated reminders and escalations for incidents in 'Awaiting User Info'. Use Virtual Agent to proactively gather required information at submission.Risk if delayed: Continued cycle time inflation and poor user experience due to avoidable delays.

Executive Decision Support

Key Risks if Delayed

Erosion of Business Trust

The systemic failure to meet SLAs (63% breach rate) undermines the credibility of the IT support function. Business stakeholders will lose confidence in IT's ability to provide reliable and timely service.

Urgency: High

Inability to Scale or Improve

With 222 process variants and no standard workflow, it is impossible to implement meaningful improvements, automation, or performance management. The process will remain inefficient and costly as volume grows.

Urgency: High

High Operational Cost and Agent Burnout

The 60% rework rate creates a significant amount of unnecessary work, driving up operational costs and leading to frustration and burnout among support staff who are constantly re-addressing the same issues.

Urgency: Medium

Readiness & Constraints

AI Readiness

Medium

Automation Readiness

Low

Data Readiness

High

Data readiness is High; core fields like priority and assignment group are consistently populated, providing a good foundation for AI. However, Automation readiness is Low because the process itself is too chaotic. AI can be applied to tactical problems like routing, but broad automation requires significant process standardization first.

Consultant Note

This assessment highlights critical instability in the Incident Management process. The consultant should focus subsequent analysis on the root causes of the 60% rework rate and the 63% SLA breach rate. Key areas for investigation are the triage/assignment process, the reasons for the 'Work in Progress -> Pending User' loop, and the drivers behind the 222 process variants.

Evidence Base

metrics, transitions, variants, field usage, sample items, activity model, time intelligence, task_sla metrics

✓ Process Metrics✓ Transitions✓ Variants✓ Field Usage✓ Status Types✓ Time Intelligence✓ Sample Items