← Blog
AI GovernanceEngineeringProduction Safety

When the AI Deletes Production: The Real Lesson from the AWS/Kiro Incident

Amazon's AI coding tool Kiro reportedly deleted a production environment last month, taking AWS services down for 13 hours. Amazon's response was that it was "user error" — an "extremely limited event" that shouldn't be generalized.

They're right that it was user error. They're wrong about which user made the error.

The Wrong Framing

The instinct is to blame the engineer who ran the tool. That's where most post-mortems will land — someone gave the AI agent too much authority, or failed to review what it was about to do, or trusted an agentic system in a context that didn't warrant it.

That framing is comfortable because it localizes the failure to a decision a human made. Fix the human behavior and you've fixed the problem.

But the deeper error was made much earlier, by whoever designed a system that could reach that state at all. Kiro was given "permissions similar to a human engineer" — which means write access to production infrastructure, with no mechanism in the system itself to scope, audit, or gate what it could touch. When it concluded that deleting and recreating the environment was the right fix, nothing in the architecture stopped it.

That's not a bug in Kiro. That's a missing governance layer.

The Governance Layer Is the Product

When AI agents operate with unconstrained authority over production systems, incidents aren't edge cases — they're the logical extension of the design. An agent with production access and a broad objective will eventually reach a decision that a human would have rejected, but that the agent's objective function supports. The only question is whether your system has guardrails before that decision executes.

This is the pattern we built against from the beginning at Novaprospect.

Every AI-generated change traces to a Jira issue. The AI operates on a branch named after that issue — never on main, never on production infrastructure directly. A prompt file stored in the repository specifies exactly what the AI is authorized to modify, and explicitly names what it cannot touch. Automated quality gates run before anything merges. A human approves the PR. sessains captures a session log — a timestamped, structured record of what the AI did, in which session, against which ticket.

The AI never has "human engineer" permissions. It has task-scoped authority. The scope is defined before the session starts, versioned in the repository, and auditable after the fact.

Why This Matters for Regulated Environments

For teams building in unregulated environments, the AWS incident is a cautionary tale about production hygiene. For teams in regulated environments — FedRAMP, HIPAA, DoD IL — it's more than that.

An auditor will ask: who authorized that change? What did it touch? When? What was the documented justification?

"The AI decided" is not an acceptable answer. Neither is "we have logs somewhere in Kiro's console." The audit trail needs to be traceable, complete, and produced as a byproduct of normal development workflow — not reconstructed after the fact when an assessor asks for it.

The governance infrastructure isn't overhead on top of using AI. It's what makes AI use in production defensible.

The Correct Takeaway

The lesson from the Kiro incident isn't "don't use AI in engineering." It's "the governance model has to come before the agent authority." That sequencing matters. Organizations that get it backwards — deploying agents first and adding governance when something breaks — will keep having incidents like this one.

The ones that build the audit trail into the development framework from the start won't avoid every mistake. But they'll be able to explain every mistake, fix it precisely, and demonstrate to auditors that they were operating with appropriate controls.

That's the standard. And it was achievable before Kiro existed.