🏛️ Deloitte Portfolio & Program Management

Live Dashboard Updated: 2026-04-07 12:54 PDT Source: Linear + #eng-customer-deloitte
Program Status
YELLOW
Apr 7, 2026

Executive Overview

Task worker resolved — heartbeat patch deployed Apr 2, holding with no further incidents. AEF context window fixed — Bedrock flag corrected Apr 6; AEF now at 1M context matching ITS. Sandbox remains critical — Apr 2 patch was ineffective; persistent volume claims still exhausted, still in triage and unassigned. Workflow agent tool_write hangs remain active (ENG-9026).

60% complete — 56 of 94 active items delivered across 6 workstreams. Cyber / Adelina workstream most active with 24 open items. 14 high-priority items across all workstreams.

🚧 Key Strategic Blockers
🔥
Sandbox Stability CRITICAL
Apr 2 patch was ineffective — PVCs still not releasing. Cluster volume limit exhausted; new sandbox sessions cannot be created. Instance type supports max 28 volumes per node. Still in triage, currently unassigned.
⚠️
Workflow Agent tool_write Hangs — AEF ACTIVE
Workflow agents intermittently stuck on tool_write operations, preventing task completion (ENG-9026). Context window issue resolved; this hang behavior remains active and in triage.
🛡️
Cyber / SaaS — Canvas Solution Under Development IN PROGRESS
Canvas remains a top priority with 6 open items. Dashboard URL parameters for drill-down navigation currently in review. Architecture discussions ongoing for production SOC deployment.
🎯 3 Main Priorities
🔧
Resolve Sandbox Stability CRITICAL
Sandbox PVCs exhausted — new sessions cannot be created. Apr 2 patch was ineffective. Instance type supports max 28 volumes per node; may need to change. Currently in triage, unassigned.
Stabilize AEF Instance ACTIVE ISSUES
Context window fixed Apr 6 (now 1M matching ITS). Active issue: workflow agent tool_write hangs causing task failures (ENG-9026). SAP MCP serialization errors in review (APO-53).
🖼️
Feature Pipeline — Canvas, Integrations & More → 24 OPEN
24 open feature requests across 2 workstreams. Canvas is #1 — strategic path under discussion. View all requests and status.

Dashboards

Each dashboard tracks a different class of work across the Deloitte engagement.

Training session
● On Track
Training
Platform training program for Deloitte teams — pilot session, LMS, video content, and live trainer coordination.
12
Items
April 8
Pilot Target
100%
Green
Dashboard analytics
● Critical Items
Bug Fixes & Info Requests
Active bugs, configuration issues, and operational requests from Deloitte teams requiring engineering response.
8
Open Items
1
Urgent
8
Resolved
Product planning
● In Progress
New Feature Requests
Platform enhancements, new integrations, and capability requests surfaced through Deloitte engagement calls.
24
Open Items
14
High Priority
6
Workstreams

Engagement Information Flow

Full mind map showing all request sources, classifications, and resolution paths across the Deloitte engagement.

🗺️ View Full Engagement Mind Map
Request sources → Requests → Classification (Bug / Training / Feature) → Resolution paths. Interactive — click to explore.

🔍 Key Health Questions Reporting Period: Apr 7, 2026

QuestionStatusExplanation
Is the team behind schedule?⚠️ PossibleCyber / Adelina workstream most active with 24 open items. 14 high-priority items across all workstreams. 10+ items completed recently.
Problems preventing cycle goal?⚠️ ActiveSandbox PVC exhaustion (ENG-9025) blocking new sandbox sessions. Workflow agent hangs (ENG-9026) impacting AEF workflows. New write-operation issues on SailPoint (ENG-9029) and Okta (ENG-9030) integrations.
Tasks added or deleted this cycle?⚠️ YesRecent changes: ThreatConnect and Jira DC integrations delivered. SMK white-label logo CORS fix and disclaimer config docs completed. Dashboard URL parameters in review. Helm E2E consolidation and TTM off embedded Postgres in progress.
Foresee issues for next period?⚠️ PossibleTraining pilot April 8 requires preparation. 24 high-priority items need attention across workstreams.
Unscheduled tasks this cycle?⚠️ SomeSMK 1.0.3.1 release prep unplanned. Jira DC + ThreatConnect debugging (now resolved) consumed cycles.
Have any estimates changed?⚠️ YesBaseline estimates produced for surviving Promise + Stretch items: 160h Promise (4 sprints), 59h Stretch.
Technical problems encountered?⚠️ ActiveJira DC and ThreatConnect resolved. AEF context window and task worker heartbeat fixed. Active: sandbox PVCs exhausted (ENG-9025), workflow agent tool_write hangs (ENG-9026), SailPoint write ops failing (ENG-9029), Okta write permissions not enabled (ENG-9030).
Resource problems?⚠️ PossibleMeta Global Ops has 10 open items. Resource allocation being monitored across workstreams.

🧭 Strategic Priorities for Portfolio Stakeholders

Decision Required1. Canvas: Tactical Fixes vs. Strategic Rebuild

Canvas remains the #1 priority with 6 active items. URL parameter drill-down currently in review. Three strategic paths under discussion: tactical fixes, Kindo API, or generative UI. Critical for SOC production deployment.

Alignment Needed2. Platform Stability: Sandbox & AEF Instance

Task worker resolved (Apr 2). AEF context window fixed (Apr 6 — now 1M). Sandbox remains critical — Apr 2 patch ineffective, PVCs still exhausted, unassigned in triage. Active: workflow agent tool_write hangs on AEF (ENG-9026). Sandbox stability is top priority.

Input Requested3. Integration Priority Ranking

SAP MCP serialization errors now in review (APO-53). SailPoint write operations failing (ENG-9029) and Okta write permissions not enabled (ENG-9030) — both new issues in triage. Oracle integration in progress. Workday not started. Stakeholder input needed to prioritize across 6+ integration requests.

Not Started4. Meta Global Ops: Resource Allocation

10 open items, zero squad allocated. Decision needed: allocate resources vs. defer? No progress possible without dedicated assignment.

✅ Accomplishments This Period

AccomplishmentOwnerStatus
SMK system upgrade to v1.0.3.0 — deployed to Deloitte hosted + self-managed instancesEngineering✅ Complete
Dashboard/Canvas agent cleanup — auto-created agents now hidden from main list, new "Dashboard Agents" filter tab liveAashman✅ Complete
DLP data scrubbing fix — customer PII was being incorrectly scrubbed; resolvedEngineering✅ Complete
Feature flag management decoupled — SMK feature flags separated from deployment-specific configurationEngineering✅ Complete
Command Center now live — visible in hosted instance and v1.0.3.0 SMKEngineering✅ Complete
Jira Data Center auth fix — basic auth vs API token mismatch resolved for self-hosted Jira DCEngineering✅ Complete
ThreatConnect MCP parameter fix — validation errors causing retry loops now resolvedEngineering✅ Complete
Task worker heartbeat fix deployed and stable — Hatchet reconnection patch deployed Apr 2; no task worker instability since (ENG-8920)Yash Kothari✅ Complete
AEF context window corrected — Bedrock flag set Apr 6; AEF instance now running 1M context window matching ITS instance (ENG-9028)Marcos Pagnucco✅ Complete
Agent workflow restart preserves webhook context — webhook context now properly maintained across agent workflow restarts, eliminating restart-related failures (ENG-8859)Yash Kothari✅ Complete
SMK white-label logo CORS fix — resolved CORS issues with custom logo assets on whitelabelled deployments (ENG-8921)Marcos Pagnucco✅ Complete
Disclaimer config documentation — documented configurable disclaimer message below chat input for SMK deployments (ENG-8922)Marcos Pagnucco✅ Complete
Jira DC auth fix — resolved authentication issues with Jira Data Center self-hosted instances (ENG-8858)Yash Kothari✅ Complete
Okta disconnected state fix — resolved issue where Okta integration showed disconnected state incorrectly (TEK-60)Engineering✅ Complete
SMK integration validation — end-to-end integration validation for SMK deployments completed (ENG-8594)Brandon C✅ Complete
SAP/Oracle integration in progress — SAP JCo/RFC connectivity work underway; Oracle integration questions from Friday being addressedEngineering⚠️ In Progress
Preflight script near completion — Marcos and Brandon finalizing; Wednesday call with Adelina to reviewMarcos / Brandon⚠️ In Progress

🔺 Active Risks

IDImpactTrendDescriptionMitigation
R1a Resolved 📉 Task Worker Stability. Heartbeat/Hatchet reconnection patch deployed and holding. No stability issues since the fix. Resolved. Monitoring continues.
R1b High 📈 Sandbox Stability. Apr 2 patch was ineffective — PVCs still not releasing. Cluster volume limit exhausted, new sessions cannot be created. Instance type max 28 volumes per node. Still in triage, unassigned. May require instance type change. Currently in triage with no active owner.
R1c High 🆕 AEF Workflow Agent Hangs. Workflow agents intermittently stuck on tool_write operations (ENG-9026). Context window fixed Apr 6 (one fewer AEF issue). SAP MCP serialization errors in review (APO-53). ENG-9026 in triage. SAP APO-53 in review. Context window resolved.
R2 Med ➡️ Canvas architecture decision pending. 6 active Canvas items. URL parameter drill-down in review. Strategic path for SOC production not finalized. Tactical fixes proceeding. Strategic decision between Kindo API and generative UI under discussion.
R3 Med 📈 SMK deployment scalability. Each deployment requires live engineering. Preflight script in progress. 7 planned deployments. Preflight script near completion. Manual checklist + Terraform automation in development. Deployment #3 will validate.
R4 Med ➡️ Meta Global Ops unresourced. 10 open items, zero squad allocated. Resource allocation under review. Prioritization aligned with workstream needs.
R5 Med ➡️ Integration gaps. SAP MCP serialization errors in review (APO-53). SailPoint write operations failing (ENG-9029). Okta write permissions not enabled for this environment (ENG-9030). Oracle in progress. Workday not started. SAP APO-53 in review. SailPoint and Okta new issues in triage. Integration sessions being scheduled.

Most Recent Meeting Office Hours — Deployment Q&A (Apr 6)

2 Critical 6 Action Items
📞 Office Hours — Deployment Q&A — Apr 6, 2026
✅ AEF Context Window Fixed — Bedrock flag corrected; AEF instance now running 1M context window matching ITS. Deployed by Marcos Pagnucco.
✅ Workflow Restart Webhook Fix — Agent workflow restarts now correctly preserve webhook context (ENG-8859). Deployed by Yash Kothari.
🔥 Sandbox PVCs Still Unresolved — Apr 2 patch ineffective. PVC exhaustion persists; new sandbox sessions cannot be created. Still in triage, unassigned.
⚠️ Workflow Agent Hangs Active — Workflow agents intermittently stuck on tool_write operations on AEF (ENG-9026). In triage.
🛡️ Cyber Weekly — Apr 2, 2026
🔥 Sandbox Still Broken — Yesterday's patch did NOT fix the sandbox issue. Same errors persist. All persistent volume mounts exhausted — new sandbox sessions cannot be created. Instance type supports max 28 volumes per node; may need different instance type. Troubleshooting call being scheduled.
✅ Task Worker Stable — Heartbeat/Hatchet reconnection patch from earlier this week is holding. No task worker stability issues reported since the fix.
🔗 SAP Integration — Partial — Working on ITS instance but connectivity issues on AEF instance. RFC errors about request format preventing testing. AEF also showing Excel output failures ("Amex error") that work fine on ITS — environmental difference, not code.
🔧 AEF Context Window Mismatch — AEF instance shows 200K context window for Claude Sonnet 4.6 while ITS shows 1M. Bedrock flag needs to be set — likely missed during AEF setup. Quick fix; call being scheduled.
📅 Meeting Frequency — Discussed whether twice-weekly meetings still needed. Decision: keep current cadence until sandbox stability is resolved.
📞 Office Hours — Deployment Q&A — Apr 2, 2026
🔥 LLM Bad Request Errors — "Bad request" errors hitting the LLM: "format of the value at message 45 tool use input is invalid." Happening across agents actively in use.
⚠️ Agent Intermittent Failures — Agents not responding intermittently since morning. Workflow agent getting stuck on "tool write in progress" when generating Excel output from large document analysis (~160 pages).
🔗 SAP/MCP Issues — AEF side still troubleshooting connectivity to SAP server. ITS instance seeing MCP failure issues with SAP integration. Questions raised about calling same integration multiple times within a workflow.
📋 Issue Consolidation — Joana committed to consolidating all outstanding issues and sending to Brandon for resolution before end of day.
💬 UI Customization Request — Team asking if agent output UI can be customized to not look like a chat interface. S3/knowledge store access questions also raised.

📦 SMK Installs — Deployment Progress

2 Complete 1 Planned 4 Blockers

Key deployment status and improvement initiatives from Office Hours (Apr 6), Cyber Weekly (Apr 2), and prior sessions.

Deployment Status

DeploymentStatusKey Issues
Deployment #1✅ CompleteSecurity group/connectivity issues discovered during install
Deployment #2✅ CompleteCalico CNI vs VPC CNI caused ingress automation failure
Deployment #3 (Digital Identity)🔵 PlannedBastion host access being requested, same environment challenges expected

Key Improvements In Progress

InitiativeStatusDetails
Preflight Script (Helm chart)⚠️ In ProgressInitial version exists, catches connectivity issues pre-install. Being expanded. Runs from within K8s cluster, CI/CD compatible.
Manual Deployment Checklist⚠️ In ProgressFor enterprise teams with multiple departments involved in provisioning and access.
Infrastructure Automation (Terraform)⚠️ In ProgressTurnkey AWS provisioning, handed to Deloitte infra team for testing.
Script Migration to Helm Charts⚠️ In ProgressMoving bastion host scripts into cluster, reducing external dependencies.

Current Blockers

BlockerSeverityMitigation
Task worker / Hatchet instability✅ ResolvedHeartbeat/Hatchet reconnection patch deployed and stable. No restarts needed since fix.
Sandbox PVC exhaustion🔴 HighPatch did not fix sandbox. All PVCs exhausted, max 28 volumes per node. May need instance type change. Troubleshooting call being scheduled.
Workflow agent tool_write hangs — AEF🔴 HighWorkflow agents intermittently stuck on tool_write operations (ENG-9026). Context window fixed; this hang behavior remains active in triage.
No observability configured⚠️ MediumDeployed instances lack OpenTelemetry/Grafana monitoring.
Enterprise AWS guardrails⚠️ MediumIAM roles, network subnets will be a challenge for every customer deployment.