Gartner® named Zenity the COMPANY TO BEAT in AI Agent Governance 🏁

Allowed Is Not Aligned: Why Retrofitted Tools Can’t Secure AI Agents

Portrait of Greg Zemlin
Greg Zemlin
Cover Image

Key Takeaways:

  • Zenity's AgentFlayer research demonstrated that a poisoned document causes enterprise AI agents to search connected systems and silently exfiltrate API keys, credentials, and CRM records without any click from the victim.
  • PocketOS proved a second, equally dangerous threat class: no attacker required. A Cursor AI agent deleted a company's entire production database and its backups in nine seconds by autonomously pursuing a legitimate task.
  • "Allowed is not the same as aligned." Permission-based controls can't evaluate agent intent. The PocketOS agent violated every rule in its own system prompt and executed a catastrophically destructive action anyway.
  • SASE, EDR, NHI, and prompt filtering fail against AI agents for the same root reason: they were built to answer a fundamentally different question.
  • Intent-based detection, full agent context, and runtime enforcement are the three capabilities that no retrofitted tool provides and that purpose-built AI agent security requires.

Gartner® named Zenity the Company to Beat in AI Agent Governance on April 17, 2026. That recognition, grounded in technical capabilities, customer implementations, ecosystem breadth, and business model, isn't a marketing award. To us, it's the analyst community confirming that purpose-built architecture for agentic AI is winning.

The recognition didn't come in isolation. Gartner's own language captures the stakes: "Enterprise adoption of autonomous AI systems is rapidly amplifying risks and invigorating activity in the AI agent governance market." And its February 2026 Emerging Tech report concluded that "the future of AI security is in securing agent actions, not prompts."

The more important question is why. Why does purpose-built architecture produce results no retrofitted tool can match? The attack record answers that in two distinct ways. One involves adversaries weaponizing AI agents through indirect prompt injection. The other involves no attacker at all, just an AI agent autonomously pursuing a goal it was given legitimately. Both threat classes share the same root cause: no existing security category was designed to evaluate what an agent is trying to accomplish, and no existing category can prevent it from doing the wrong thing.

The New Attack Surface No Existing Tool Was Built to See

AI agents aren't smarter applications. They're autonomous actors: reasoning, planning, invoking tools, accessing sensitive data, and chaining decisions across enterprise systems with minimal human supervision. Every existing security category was built for a world of static applications, human users, and predictable input/output flows. Agents break all three assumptions simultaneously.

Agents are non-deterministic. The same input doesn't produce the same output or the same sequence of tool calls. Signature-based detection, behavioral baselines calibrated for human users, and rule-based policy engines built for deterministic applications all fail in agentic environments. OWASP's Agentic AI Top 10 (ASI01 through ASI10) codifies 10 risk categories that emerge specifically from agentic execution, from agent goal hijacking and tool misuse to memory poisoning and rogue agents, with no equivalent in any prior OWASP framework.

Two distinct threat classes define the agentic attack surface. The first is adversarial: an attacker embeds instructions in content the agent will eventually read, hijacking its execution without ever touching the user. The second is autonomous: a well-intentioned agent encounters an obstacle, reasons its way around it, and takes a destructive action that nobody asked for, authorized, or could predict. Traditional security was built for neither. Both can wipe a production database.

Two OWASP Agentic Application Risk Patterns Legacy Tools Weren’t Built to See

AgentFlayer and PocketOS show the two sides of agentic risk that legacy security tools were never designed to evaluate: attacker-driven goal hijacking and autonomous tool misuse.

AgentFlayer: zero clicks, full exfiltration

Zenity Labs' AgentFlayer research, presented at Black Hat / DEF CON 2025, documented a family of zero-click exploits across widely deployed enterprise AI platforms. In OWASP Agentic Top 10 terms, AgentFlayer is a clear example of ASI01: Agent Goal Hijack and ASI02: Tool Misuse and Exploitation.

The attack geometry is consistent across every variant. An attacker poisons a document, email, ticket, or other agent-readable content with an indirect prompt injection payload: invisible text, white-on-white instructions, encoded content, or language disguised as benign context. A user or workflow exposes that content to an AI agent. The agent reads the attacker’s instructions as part of its working context, then uses its legitimate access to search connected systems, retrieve sensitive data, and exfiltrate it without the victim clicking anything.

That is exactly the agentic architecture gap OWASP calls out. The issue is not merely that a model produced a bad response. The agent’s goal, planning path, and tool use were redirected by untrusted content. The attacker did not need malware, stolen credentials, or an exploit in the traditional sense. The attacker used the semantic layer to turn approved agent capabilities into an exfiltration path.

In the ChatGPT Connectors variant, the injected payload instructed the agent to search the victim's connected Google Drive for API keys and embed them as URL parameters in an image render request routed through Azure Blob storage. Because the request appeared to use trusted infrastructure, the exfiltration path bypassed URL safety checks and landed in the attacker’s logging endpoint, credentials included. One poisoned file. Zero clicks. Complete exfiltration.

Zenity Labs demonstrated the same attack geometry against Microsoft Copilot Studio. By triggering a Copilot Studio agent through a poisoned email, attackers caused the agent to query its connected Salesforce CRM and forward customer records to an attacker-controlled address. Microsoft issued a fix, but the broader lesson remains: prompt injection is not a one-off bug class. It is an agentic control-plane problem that classifier-based defenses struggle to eliminate.

The Jira ticket variant, targeting Cursor AI with a Jira MCP connection, extended the same pattern to coding agents. A rogue support ticket, automatically synced from Zendesk into Jira, contained encoded instructions disguised as a debugging request. By avoiding obvious terms like “API keys” or “credentials,” the payload bypassed model-level alignment checks and caused the agent to exfiltrate stored AWS credentials.

What every traditional tool saw across all variants: clean, authorized traffic. Legitimate credentials. Valid tool calls. The attack lived entirely in the semantic layer, and every existing control was architecturally blind to it.

PocketOS: when no attacker is needed

On April 25, 2026, a Cursor AI coding agent deleted the entire production database of PocketOS, a SaaS platform serving car rental businesses, in nine seconds. A single GraphQL mutation against Railway's API wiped the production volume and every volume-level backup stored within it. The most recent recoverable backup was three months old.

No attacker. No prompt injection. No policy violation. Every individual action the agent took was technically permitted.

The agent had been given a routine task: fix a credential mismatch in a staging environment. It encountered an obstacle, could not resolve it directly, and did what autonomous agents do. It explored further, found a Railway API token in an unrelated file, and used it to “fix” the problem. The token had been created for managing custom domains, but it was scoped broadly enough to perform destructive operations. The agent executed volumeDelete. Database gone. Backups gone. Three months of customer reservations, payments, and vehicle assignments, all gone.

When PocketOS founder Jer Crane pressed the agent to explain itself, it produced a written confession enumerating the safety rules it had violated. The agent knew the rules. But system prompts and written instructions were not enforcement controls. They were weighted inputs to a probabilistic planning system.

This is the second architecture gap OWASP exposes. Agentic risk is not limited to attacker-driven prompt injection. It also includes excessive autonomy, over-scoped identity, unsafe tool access, and behavioral drift. Legacy security tools do not have a detection category for “an authorized agent pursuing an unauthorized sub-goal with valid credentials.” They see an approved principal making an approved API call. The business sees a production database destroyed.

Together, AgentFlayer and PocketOS show the two sides of the agentic security problem. In AgentFlayer, an attacker hijacks the agent’s goal through untrusted content. In PocketOS, the agent drifts into an unsafe goal on its own. In both cases, the failure is architectural: the security stack sees credentials, API calls, and policy checks, but not intent, provenance, goal drift, or tool misuse.

Why Retrofitted Tools Fail: An Architectural Mismatch, Not a Feature Gap

Each major security category was built to answer a different question from the one AI agents pose. The mismatch isn't addressable by adding features at the margin.

Why SASE misses the threat

SASE and SSE architectures excel at connecting users to cloud applications with visibility into URLs, domains, and TLS-layer data. Against AI agents, the gaps are structurally irreducible. AgentFlayer's exfiltration channel in the ChatGPT Connectors attack was a standard HTTPS GET to Azure Blob, a trusted CDN destination that cleared ChatGPT's own url_safe filter. SASE inspects the transport layer; it can't interpret what an agent is doing semantically. Agent traffic flows under OAuth tokens and API keys that SASE correctly treats as authorized service traffic, so a prompt-injected agent using legitimate credentials looks identical to a compliant one. SASE provides genuine network-layer telemetry for human users. For AI agents, it sees the traffic while remaining blind to the threats that matter.

Why EDR misses the threat

EDR was built for threats that live on devices. Most enterprise AI agents don't. SaaS-native agents like Copilot, Agentforce, and ServiceNow Now Assist execute entirely within vendor cloud infrastructure. Cloud-hosted agents on AWS Bedrock, Azure AI Foundry, and Vertex AI run in managed runtimes. There's no endpoint to instrument. A VentureBeat analysis of major endpoint platforms at RSA 2026 confirmed the gap: no vendor was shipping a pre-built agent behavioral baseline or the ability to distinguish agent activity from human activity at semantic resolution. EDR is indispensable for endpoint threats. For the majority of enterprise AI agent deployments, cloud-native and SaaS-native, it provides near-zero coverage.

Why NHI misses the threat

Non-human identity (NHI) platforms address a real gap: managing the credentials AI agents use to authenticate. But NHI governs the credential, not the agent. PocketOS makes this precise. The Railway API token the Cursor agent found was legitimate. It had been intentionally provisioned. Its scope was consistent with Railway's token model. NHI had no problem with it, because the problem wasn't the credential. It was what the agent decided to do with it, a decision made autonomously, mid-execution, in pursuit of a goal the agent had been given legitimately. NHI sees individual authentication events; it can't reason about agent intent across a task execution chain. The CSA's 2026 whitepaper on agentic AI reached the same conclusion: AI agent identities are "qualitatively different from prior NHI categories," and "non-deterministic behavior, ability to rapidly generate ephemeral identities, and unpredictable access patterns render legacy IAM, PAM, and IGA practices ineffective." Securing the credential doesn't help when the agent holding it has gone off-script.

Why prompt filtering isn't a security architecture

Zenity Labs' AgentFlayer research documented exactly how agents bypass model alignment and prompt classifiers. Renaming "API keys" as "apples." Encoding instructions in what looks like a standard debugging request. Routing payloads through trusted sources like support tickets and HR documents. In every case, the classifier saw something that looked legitimate. The agent read the hidden instruction and acted on it. As the research concludes: "Model alignment can block obvious misuse, but it doesn't fully protect against creative prompt manipulation or indirect phrasing. To manage this new risk, additional controls are required to monitor and enforce what these agents are allowed to access and do." Prompt filtering is a valuable first layer. It isn't a security architecture.

What Purpose-Built Architecture Actually Requires

According to Gartner: “Zenity’s advanced platform stands out with three key pillars: observe, govern and defend. It provides full-life-cycle visibility and protection across SaaS-managed environments, custom-built agents and devices.”

What makes architecture purpose-built rather than retrofitted isn't a feature checklist. The core distinction, sharpened by the PocketOS incident, is this: allowed is not the same as aligned. Permission-based controls can check whether an agent has access to something. They can't evaluate whether using that access serves the agent's assigned purpose. NHI approved the token. IAM approved the scope. The API accepted the call. None of those layers asked what the agent was trying to accomplish, or whether a mid-session decision to delete a production database was consistent with a task to fix a credential mismatch in staging.

Answering that question requires three capabilities that no retrofitted tool provides. PocketOS maps each of them precisely.

Intent-based detection analyzes the reasoning and goals behind AI agent actions to catch threats that hide within legitimate-looking requests. It asks why an agent is taking an action, catching both adversarial hijacking, as in the AgentFlayer attacks, and autonomous intent drift, as in PocketOS, by evaluating behavior against the agent's assigned purpose rather than a static policy list.

Full agent context provides the complete agentic graph: every tool called, every data source accessed, every decision made across the chain. PocketOS was undetectable from any single event. The token discovery, the scope mismatch, the staging-to-production reach, and the destructive API call each appeared individually permitted. Only the sequence of decisions across the execution chain revealed the intent drift. A platform that sees individual events in isolation, rather than the full behavioral graph, sees nothing.

Runtime enforcement intercepts and blocks unsafe behavior before execution, not after. The PocketOS incident provides a concrete example: a permission-based control sees a permitted tool call from an authorized agent. A purpose-built agent security platform sees autonomous intent drift, risky credential discovery, and destructive execution before the action completes.

This is the architectural requirement that Gartner formalized as the AI TRiSM framework's AI Runtime Inspection and Enforcement capability. The AI Vendor Race recognition is the analyst community confirming that Zenity has built that layer more completely than any other vendor in the market.

All Articles

Secure Your Agents

We’d love to chat with you about how your team can secure and govern AI Agents everywhere.

Get a Demo