🏛️ Deloitte Portfolio & Program Management

Live Dashboard Updated: 2026-04-02 10:47 PDT Source: Linear + #eng-customer-deloitte
Program Status
YELLOW
Apr 2, 2026

Executive Overview

Task worker stabilized — heartbeat/Hatchet patch deployed and holding. Sandbox remains critical — yesterday's patch did NOT resolve the issue; persistent volume claims still exhausted, may require instance type change. New AEF instance issues emerging: LLM bad request errors and context window misconfiguration.

60% complete — 56 of 94 active items delivered across 6 workstreams. Cyber / Adelina workstream most active with 24 open items. 14 high-priority items across all workstreams.

🚧 Key Strategic Blockers
Task Worker Stability RESOLVED
Heartbeat/Hatchet reconnection patch deployed earlier this week is holding. No task worker stability issues reported since the fix. Daily manual restarts no longer required.
🔥
Sandbox Stability CRITICAL
Yesterday's patch did NOT fix the sandbox issue — same errors persist. All persistent volume claims exhausted; new sandbox sessions cannot be created. Current instance type supports max 28 volumes per node — may need different instance type. Troubleshooting call being scheduled.
⚠️
LLM Bad Request Errors — AEF ACTIVE
"Bad request" errors hitting the LLM across agents actively in use — "format of the value at message 45 tool use input is invalid." Agents intermittently not responding; workflow agents stuck on tool write operations. Joana consolidating all issues for Brandon by end of day.
🔧
AEF Context Window Misconfiguration QUICK FIX
AEF instance showing 200K context window for Claude Sonnet 4.6 vs 1M on ITS. Bedrock flag missed during AEF setup. Call being scheduled to resolve — expected quick fix.
🛡️
Cyber / SaaS — Canvas Solution Under Development IN PROGRESS
Canvas remains a top priority with 6 open items. Dashboard URL parameters for drill-down navigation currently in review. Architecture discussions ongoing for production SOC deployment.
🎯 3 Main Priorities
🔧
Resolve Sandbox Stability CRITICAL
Sandbox PVCs exhausted — new sessions cannot be created. Instance type may need to change (max 28 volumes per node). Troubleshooting call being scheduled with Marcos to evaluate options.
Stabilize AEF Instance ACTIVE ISSUES
Multiple issues on AEF: LLM bad request errors across agents, context window misconfiguration (200K vs 1M), Excel output failures, and SAP connectivity gaps. Joana consolidating for Brandon today.
🖼️
Feature Pipeline — Canvas, Integrations & More → 24 OPEN
24 open feature requests across 3 workstreams. Canvas is #1 — strategic path under discussion. View all requests and status.

Dashboards

Each dashboard tracks a different class of work across the Deloitte engagement.

Training session
● On Track
Training
Platform training program for Deloitte teams — pilot session, LMS, video content, and live trainer coordination.
13
Items
April 8
Pilot Target
100%
Green
Dashboard analytics
● Critical Items
Bug Fixes & Info Requests
Active bugs, configuration issues, and operational requests from Deloitte teams requiring engineering response.
5
Open Items
1
Urgent
5
Resolved
Product planning
● In Progress
New Feature Requests
Platform enhancements, new integrations, and capability requests surfaced through Deloitte engagement calls.
24
Open Items
14
High Priority
6
Workstreams

Engagement Information Flow

Full mind map showing all request sources, classifications, and resolution paths across the Deloitte engagement.

🗺️ View Full Engagement Mind Map
Request sources → Requests → Classification (Bug / Training / Feature) → Resolution paths. Interactive — click to explore.

🔍 Key Health Questions Reporting Period: Apr 2, 2026

QuestionStatusExplanation
Is the team behind schedule?⚠️ PossibleCyber / Adelina workstream most active with 24 open items. 14 high-priority items across all workstreams. 10+ items completed recently.
Problems preventing cycle goal?🟢 MitigatedJira DC auth and ThreatConnect validation issues are now resolved and operational.
Tasks added or deleted this cycle?⚠️ YesRecent changes: ThreatConnect and Jira DC integrations delivered. SMK white-label logo CORS fix and disclaimer config docs completed. Dashboard URL parameters in review. Helm E2E consolidation and TTM off embedded Postgres in progress.
Foresee issues for next period?⚠️ PossibleTraining pilot April 8 requires preparation. 24 high-priority items need attention across workstreams.
Unscheduled tasks this cycle?⚠️ SomeSMK 1.0.3.1 release prep unplanned. Jira DC + ThreatConnect debugging (now resolved) consumed cycles.
Have any estimates changed?⚠️ YesBaseline estimates produced for surviving Promise + Stretch items: 160h Promise (4 sprints), 59h Stretch.
Technical problems encountered?🟢 ResolvedJira DC auth flow and ThreatConnect MCP parameter validation — both fixed and deployed.
Resource problems?⚠️ PossibleMeta Global Ops has 10 open items. Resource allocation being monitored across workstreams.

🧭 Strategic Priorities for Portfolio Stakeholders

Decision Required1. Canvas: Tactical Fixes vs. Strategic Rebuild

Canvas remains the #1 priority with 6 active items. URL parameter drill-down currently in review. Three strategic paths under discussion: tactical fixes, Kindo API, or generative UI. Critical for SOC production deployment.

Alignment Needed2. Platform Stability: Sandbox & AEF Instance

Task worker stabilized (Hatchet patch holding). Sandbox remains critical — patch failed, PVCs exhausted, may need instance type change. AEF instance has multiple emerging issues: LLM errors, context window misconfiguration, Excel output failures. Operational stability remains top priority.

Input Requested3. Integration Priority Ranking

SAP/Oracle integration in progress. SailPoint and Workday need custom MCP development. Integration deep-dive sessions being scheduled. Stakeholder input needed to prioritize across 6+ integration requests.

Not Started4. Meta Global Ops: Resource Allocation

10 open items, zero squad allocated. Decision needed: allocate resources vs. defer? No progress possible without dedicated assignment.

✅ Accomplishments This Period

AccomplishmentOwnerStatus
SMK system upgrade to v1.0.3.0 — deployed to Deloitte hosted + self-managed instancesEngineering✅ Complete
Dashboard/Canvas agent cleanup — auto-created agents now hidden from main list, new "Dashboard Agents" filter tab liveAashman✅ Complete
DLP data scrubbing fix — customer PII was being incorrectly scrubbed; resolvedEngineering✅ Complete
Feature flag management decoupled — SMK feature flags separated from deployment-specific configurationEngineering✅ Complete
Command Center now live — visible in hosted instance and v1.0.3.0 SMKEngineering✅ Complete
Jira Data Center auth fix — basic auth vs API token mismatch resolved for self-hosted Jira DCEngineering✅ Complete
ThreatConnect MCP parameter fix — validation errors causing retry loops now resolvedEngineering✅ Complete
Heartbeat bug diagnosis complete — root cause identified between chat and task worker; Hatchet reconnection fix developed and being deployedMarcos / Brandon✅ Complete
SMK white-label logo CORS fix — resolved CORS issues with custom logo assets on whitelabelled deployments (ENG-8921)Marcos Pagnucco✅ Complete
Disclaimer config documentation — documented configurable disclaimer message below chat input for SMK deployments (ENG-8922)Marcos Pagnucco✅ Complete
Jira DC auth fix — resolved authentication issues with Jira Data Center self-hosted instances (ENG-8858)Yash Kothari✅ Complete
Okta disconnected state fix — resolved issue where Okta integration showed disconnected state incorrectly (TEK-60)Engineering✅ Complete
SMK integration validation — end-to-end integration validation for SMK deployments completed (ENG-8594)Brandon C✅ Complete
SAP/Oracle integration in progress — SAP JCo/RFC connectivity work underway; Oracle integration questions from Friday being addressedEngineering⚠️ In Progress
Preflight script near completion — Marcos and Brandon finalizing; Wednesday call with Adelina to reviewMarcos / Brandon⚠️ In Progress

🔺 Active Risks

IDImpactTrendDescriptionMitigation
R1a Resolved 📉 Task Worker Stability. Heartbeat/Hatchet reconnection patch deployed and holding. No stability issues since the fix. Resolved. Monitoring continues.
R1b High 📈 Sandbox Stability. Yesterday's patch did NOT fix it. PVCs exhausted — new sessions cannot be created. Instance type supports max 28 volumes per node. May need different instance type. Troubleshooting call being scheduled with Marcos.
R1c High 🆕 AEF Instance Issues. LLM bad request errors across agents. Context window misconfigured (200K vs 1M). Excel output failures. SAP connectivity gaps. Joana consolidating all issues for Brandon by end of day Apr 2. Context window fix expected to be quick (Bedrock flag).
R2 Med ➡️ Canvas architecture decision pending. 6 active Canvas items. URL parameter drill-down in review. Strategic path for SOC production not finalized. Tactical fixes proceeding. Strategic decision between Kindo API and generative UI under discussion.
R3 Med 📈 SMK deployment scalability. Each deployment requires live engineering. Preflight script in progress. 7 planned deployments. Preflight script near completion. Manual checklist + Terraform automation in development. Deployment #3 will validate.
R4 Med ➡️ Meta Global Ops unresourced. 10 open items, zero squad allocated. Resource allocation under review. Prioritization aligned with workstream needs.
R5 Med ➡️ Integration gaps. SAP/Oracle in progress. SailPoint, Workday not started. Custom MCP development needed. SAP work underway. Integration deep-dive sessions being scheduled.

Most Recent Meetings Cyber Weekly (Apr 2) + Office Hours (Apr 2)

2 Critical 6 Action Items
🛡️ Cyber Weekly — Apr 2, 2026
🔥 Sandbox Still Broken — Yesterday's patch did NOT fix the sandbox issue. Same errors persist. All persistent volume mounts exhausted — new sandbox sessions cannot be created. Instance type supports max 28 volumes per node; may need different instance type. Troubleshooting call being scheduled.
✅ Task Worker Stable — Heartbeat/Hatchet reconnection patch from earlier this week is holding. No task worker stability issues reported since the fix.
🔗 SAP Integration — Partial — Working on ITS instance but connectivity issues on AEF instance. RFC errors about request format preventing testing. AEF also showing Excel output failures ("Amex error") that work fine on ITS — environmental difference, not code.
🔧 AEF Context Window Mismatch — AEF instance shows 200K context window for Claude Sonnet 4.6 while ITS shows 1M. Bedrock flag needs to be set — likely missed during AEF setup. Quick fix; call being scheduled.
📅 Meeting Frequency — Discussed whether twice-weekly meetings still needed. Decision: keep current cadence until sandbox stability is resolved.
📞 Office Hours — Deployment Q&A — Apr 2, 2026
🔥 LLM Bad Request Errors — "Bad request" errors hitting the LLM: "format of the value at message 45 tool use input is invalid." Happening across agents actively in use.
⚠️ Agent Intermittent Failures — Agents not responding intermittently since morning. Workflow agent getting stuck on "tool write in progress" when generating Excel output from large document analysis (~160 pages).
🔗 SAP/MCP Issues — AEF side still troubleshooting connectivity to SAP server. ITS instance seeing MCP failure issues with SAP integration. Questions raised about calling same integration multiple times within a workflow.
📋 Issue Consolidation — Joana committed to consolidating all outstanding issues and sending to Brandon for resolution before end of day.
💬 UI Customization Request — Team asking if agent output UI can be customized to not look like a chat interface. S3/knowledge store access questions also raised.

📦 SMK Installs — Deployment Progress

2 Complete 1 Planned 4 Blockers

Key deployment status and improvement initiatives from Cyber Weekly (Apr 2), Office Hours (Apr 2), and prior sessions.

Deployment Status

DeploymentStatusKey Issues
Deployment #1✅ CompleteSecurity group/connectivity issues discovered during install
Deployment #2✅ CompleteCalico CNI vs VPC CNI caused ingress automation failure
Deployment #3 (Digital Identity)🔵 PlannedBastion host access being requested, same environment challenges expected

Key Improvements In Progress

InitiativeStatusDetails
Preflight Script (Helm chart)⚠️ In ProgressInitial version exists, catches connectivity issues pre-install. Being expanded. Runs from within K8s cluster, CI/CD compatible.
Manual Deployment Checklist⚠️ In ProgressFor enterprise teams with multiple departments involved in provisioning and access.
Infrastructure Automation (Terraform)⚠️ In ProgressTurnkey AWS provisioning, handed to Deloitte infra team for testing.
Script Migration to Helm Charts⚠️ In ProgressMoving bastion host scripts into cluster, reducing external dependencies.

Current Blockers

BlockerSeverityMitigation
Task worker / Hatchet instability✅ ResolvedHeartbeat/Hatchet reconnection patch deployed and stable. No restarts needed since fix.
Sandbox PVC exhaustion🔴 HighPatch did not fix sandbox. All PVCs exhausted, max 28 volumes per node. May need instance type change. Troubleshooting call being scheduled.
AEF LLM bad request errors🔴 High"Bad request" errors across agents on AEF. Joana consolidating all issues for Brandon by end of day.
No observability configured⚠️ MediumDeployed instances lack OpenTelemetry/Grafana monitoring.
Enterprise AWS guardrails⚠️ MediumIAM roles, network subnets will be a challenge for every customer deployment.