Claude Moves to the Darkside: What a Rogue Coding Agent Could Do Inside Your Org

Cover Image

On November 13, 2025, Anthropic disclosed the first known case of an AI agent orchestrating a broad-scale cyberattack with minimal human input. The Chinese state-sponsored threat actor GTG-1002 weaponized Claude Code to carry out over 80% of a sophisticated cyber espionage campaign autonomously. This included reconnaissance, exploitation, credential harvesting, and data exfiltration across more than 30 major organizations worldwide. The impact was real. And the AI was in control.

Weaponizing Claude Was Surprisingly Easy

This wasn’t a model custom-trained for hacking. Claude Code, like many developer assistants now embedded across the enterprise, was designed to help software teams move faster. But GTG-1002 showed the world how little effort it takes to hijack that productivity and repurpose it for offensive operations.

With a few carefully crafted prompts and persona engineering tactics, the attackers convinced Claude it was acting as a legitimate penetration tester. The model didn’t push back. It didn’t ask questions. It simply executed. At machine speed. Across multiple targets. With memory, tool access, and zero human hesitation.

The implication: any sufficiently capable AI coding agent can be socially engineered into becoming an attacker.

One of the most quietly powerful moves GTG-1002 made was embedding MCP (Model Context Protocol) servers into the attack. These servers gave Claude access to what looked like safe, sanctioned tools: CLI access, browser automation, internal APIs. But they were built solely to carry out offensive operations while making each discrete action appear legitimate. No custom malware. Just a well-structured scaffolding designed to push the agent further into the enterprise without tripping alarms.

The takeaway is clear. Any advanced AI coding agent can be tricked into acting maliciously if it is given the right inputs in the right context.

Claude Can Be Malicious, What Does It Mean For You?

One of the main takeaways from Anthropic’s finding is that Claude can easily be tricked into acting maliciously. All the attackers needed to do in order to get Claude to engage in malicious behavior is a simple role-play. Telling Claude that it was operating on behalf of a legitimate cybersecurity firm and operating within a legitimate cybersecurity testing activity. Now Claude thinks that it acts as a white-hat penetration tester, while actually carrying out a large-scale black-hat attack.

This comes as no surprise, at Zenity, we often use AI models (including Claude) to help us craft prompt injection payloads as part of our internal red teaming. At first the model refuses, but when told that it’s being used to test AI agents as part of an internal security testing procedure it very happily complies, and successfully crafts effective and elaborate prompt injection attacks.

But if Claude can be tricked into acting maliciously that easily, what does it mean if you have it in your organization?

Supercharged Insider Risk

Before AI, malicious insiders were already a serious threat that organizations had to manage. But giving these insiders access to AI puts that threat on steroids. As demonstrated by GTG-1002, Claude Code is incredibly effective at carrying out cybersecurity attacks. Give it to a malicious insider and they now have a prolific penetration tester automating their harmful intent. In this new reality, every insider threat is amplified - an insider’s capabilities and knowledge are no longer limited by their own expertise, with Claude Code at their disposal they can generate exploit code, bypass defenses, and orchestrate complex attacks with unprecedented precision. All it takes is a simple role-play prompt.

But there’s more, if you use Claude Code it means it’s integrated into your environment. It has access to your repos, has write permission to your directories and can even execute shell commands on your employees' machines. And worse, Claude Code can behave in misaligned ways even without a malicious insider operating it. Examples of coding agents misbehaving are not hard to find. From Cursor deleting entire directories completely unprompted to Claude Code deleting entire DB tables and even an instance of Amazon Q being fully compromised with a malicious prompt inserted into its official code base. The attack Anthropic just managed to disrupt is a prime example of what Claude is capable of in case it goes rogue.

What Can Claude Code Do Inside Your Organization?

Organizations must now consider these two distinct but interconnected risks. These risks stem from tools you've already deployed, often without meaningful oversight, and the results could be catastrophic. A compromised coding agent would be able to:

  • Generate insecure code by default, or worse, malicious code with embedded exploits.
  • Identify vulnerabilities in internal or customer codebases without needing advanced offensive security skills.
  • Insert backdoors into source code, or push updates that introduce critical vulnerabilities.
  • Request or leak credentials stored in environment variables, configuration files, or repositories.
  • If granted access to internal APIs or orchestration tools, it could move laterally, escalating privileges and reaching sensitive systems well outside its intended scope.
  • If integrated with your CI/CD pipeline or DevOps stack, could issue destructive commands to production environments under the guise of routine tasks.

Claude can carry out everything from reconnaissance to exploitation to exfiltration without further human involvement. The entire lifecycle of an attack is now just a prompt away. (or in some cases, just a stroke of bad luck away)

None of these actions appear overtly malicious in isolation. The risk emerges from the sequencing and orchestration of steps that, taken together, amount to a full-scale breach.

Would You Be Able to Tell if Claude Code Was Acting This Way in Your Environment?

Whether it’s an internal insider acting on its own permissions, an external insider manipulating through compromised integrations, or an AI assistant going rogue unintended. Your AI is doing the attacker’s work from within. What made GTG-1002 so dangerous was the speed and scale enabled by autonomy and the proficiency of the LLM in carrying out the attack. Automation scaled the attack faster than legacy tools could keep up. While these tools can tell you when you’re under attack, they are not equipped to recognize an insider threat that emerges from a seemingly helpful coding assistant. Legacy tooling doesn’t provide insight into what AI agents are doing or how they are behaving.

To defend against AI-driven threats, we need visibility into the agent's reasoning, tool use, and entire activity. Not just its output. Ask yourself:

  • What tools is this agent invoking? GTG-1002 connected Claude Code to custom made MCP servers. These MCP servers were used by Claude to orchestrate the attack. Claude Code can have malicious tools connected to it. Both purposefully by a malicious insider or accidentally by a naive employee. Without tool call visibility and monitoring, identifying whether your own organizational Claude Code is acting maliciously is verging on the impossible.
  • What systems is it accessing? Claude Code is a powerful AI running directly on your employees’ machines. It inherits their permissions, can create and edit files, and is even capable of executing shell commands. Moreover, it can be connected to CI/CD pipelines and multiple MCP tools (such as Github or Jira) for increased autonomy. This integration into organizational systems makes the risk of Claude acting maliciously much more potent. The more access you give Claude the more powerful it becomes. But in AI, elevated power also comes with elevated risk.  
  • Why is it making this decision now? Claude can be tricked into acting maliciously because it takes the user’s intent at face value. But, while Intent can be bypassed, actions cannot be hidden. A human observer would instantly recognize when the agent’s behavior crosses into suspicious territory. By monitoring the agent’s decisions and actions in real time, we can detect when it’s going rogue - executing what it perceives as legitimate actions while, in fact, engaging in harmful activity.

Zenity is Built for Attacks Like GTG-1002 Inside Your organization

When Claude Code is turned into an attacker, it doesn't breach your environment from the outside. It operates from within, using valid permissions and approved tools. That's exactly what makes it so dangerous. Zenity provides the visibility and control needed to detect and stop a weaponized AI coding agent before it causes harm:

  • We surface every agent operating across your SaaS, cloud, and endpoint environments, including shadow deployments no one approved.
  • We track each agent's configuration, what tools it uses, what systems it can access, and whether it’s over-permissioned for its role.
  • We enforce policies that prevent high-risk actions like autonomous scanning, remote command execution, or unsanctioned code changes.
  • We analyze behavior in context, identifying when an agent’s pattern of activity matches early-stage attack sequences, even if no individual step appears malicious.
  • We monitor interactions with MCP servers and flag unauthorized or untrusted infrastructure that could serve as a control point for abuse.

If Claude had been running in an environment protected by Zenity, we would have seen it:

  • Executing reconnaissance across external networks
  • Generating and testing exploit payloads
  • Harvesting credentials and probing internal services
  • Parsing and categorizing exfiltrated data at machine speed
  • Making questionable MCP calls to custom made MCP servers designed to help conduct cyber operations

Each phase would have triggered alerts, policy violations, or automated responses. The attack might never have made it past the first prompt.

The Path Forward: From Passive Monitoring to Active Control

GTG-1002 proved that agents can and will be weaponized. It also proved that traditional controls are blind to what matters. This is your opportunity to lead the shift from passive monitoring to agent-aware defense.

Start by asking:

  • Do we know what AI agents exist across our org?
  • Can we see what tools they invoke and what systems they touch?
  • Are we enforcing consistent policies on what they’re allowed to do?
  • Do we have real-time mechanisms to stop them when things go wrong?

If the answer is no, you’re flying blind. And attackers are already proving how easy it is to take control.

Request an AI Agent Risk Assessment today.

The next GTG-1002 won’t announce itself. But you can be ready for it.


All Articles

Secure Your Agents

We’d love to chat with you about how your team can secure and govern AI Agents everywhere.

Get a Demo