Novaprospect

Two years after the industry collectively agreed that prompt injection was a serious problem, it is still the default vulnerability in most agentic deployments. You can reproduce a working exploit in a few minutes against a meaningful fraction of production systems that advertise AI features. That has not changed.

What has changed is the sophistication of the delivery. The early exploits were direct: paste a string into a chat box, convince the model to ignore its system prompt, exfiltrate the instructions. The current generation is indirect — payloads hidden in documents the agent will summarize, web pages it will browse, tickets it will triage, emails it will draft replies to. The attack surface is every source of text the agent consumes, which in an agentic system is effectively the entire internet.

Why the fix has not landed

The reason this vulnerability persists is not that the research community lacks ideas. There are credible mitigations: dual-model architectures, privilege separation between the planning and execution layers, content boundaries that distinguish trusted from untrusted input, structured tool interfaces that reject free-form strings. The mitigations exist.

The reason they have not landed is that most teams building agentic features are still optimizing for capability demonstrations. "Look what the agent can do" beats "look how carefully the agent is scoped." Security-by-design work is invisible in a demo. Capability is not.

That incentive gradient is the real problem. Until the cost of an incident exceeds the cost of slowing down a launch, the default posture will continue to be "wire it up and see what happens."

What this means if you are deploying

If you are putting agents in production against untrusted input — and almost all production agents are, whether their teams recognize it or not — assume prompt injection will be attempted against you. Then ask a harder question: what is the blast radius when it succeeds?

An agent with read-only access to a scoped dataset and no ability to act on external systems is a contained problem. An agent with write access to your production infrastructure, your customer records, or your outbound communication channels is an unbounded problem. The severity of a successful injection is determined entirely by the authority you have given the agent before the attack begins.

This is why we design our internal agent frameworks around task-scoped authority rather than broad capability. Every agent session operates against an explicit scope defined in a prompt file, versioned in the repository, and enforced by the orchestration layer. A compromised session can do damage within that scope. It cannot escalate out of it.

That is not a complete defense. But it is the difference between an incident and a catastrophe, and for most organizations that is the difference that actually matters.

The uncomfortable conclusion

Prompt injection is not going to be solved at the model layer in any timeframe a security team should be planning against. It will be contained at the architecture layer or it will not be contained. Teams that have internalized that will build differently from teams that are still waiting for the foundation model providers to fix the problem for them.

The model providers are not going to fix this problem for you. Plan accordingly.

Prompt Injection in 2026: Still the Default Vulnerability

Why the fix has not landed

What this means if you are deploying

The uncomfortable conclusion