Homegrown AI Security: Securing Custom AI Agent Pipelines

Key Takeaways:

Homegrown AI systems are custom agentic pipelines built by internal engineering teams using AI SDKs, orchestration frameworks, and cloud AI APIs. They carry the deepest integration with enterprise data and infrastructure, the highest capabilities, and the highest blast radius if compromised.
The introduction of MCP and A2A protocols has dramatically expanded the attack surface. Supply chain compromise, high blast radius, inter-agent communication risks, prompt injection, and data exfiltration are all live threat vectors in homegrown AI environments.
A clean model or well-configured pipeline doesn't make a secure agent. Homegrown AI systems need to be governed, monitored, and controlled at the point where they act, not just at the point where they're built.
Security requires controls at every phase of the AI development lifecycle, from pre-deployment red-teaming and supply chain validation through runtime behavioral monitoring and incident response with forensic replay.
The organizations building the most capable AI systems also carry the most complex security requirements, and the conventional wisdom around application security doesn't transfer cleanly to agentic AI pipelines that reason, plan, and act autonomously.

Homegrown AI security addresses the hardest category of AI risk in the enterprise. When internal engineering teams build custom agentic pipelines using LangChain, CrewAI, LangGraph, Strands Agents, Claude Agent ADK, AWS Bedrock, or similar frameworks, they create systems that are more deeply integrated with enterprise data and infrastructure than any other AI archetype. These pipelines query internal databases, invoke APIs, chain together specialized agents, process sensitive documents, and produce outputs that trigger real downstream actions inside the organization.

The security challenge is commensurate with the capability. Homegrown AI systems have the highest blast radius of any AI archetype if compromised. A misconfigured or manipulated agentic pipeline with access to financial data, customer records, and operational systems can cause damage that propagates across the organization before any human reviewer notices something is wrong. And the introduction of model context protocol (MCP) and agent-to-agent (A2A) protocols has expanded the attack surface further still, adding supply chain risk and inter-agent communication vulnerabilities to an already complex threat model.

This article explains what makes homegrown AI uniquely challenging to secure, what the specific threat vectors are, and what a governance program that matches the risk actually looks like.

What Homegrown AI Systems Are and Why They're Different

Homegrown AI systems are custom agents and agentic pipelines built by internal engineering teams using AI SDKs and orchestration frameworks. Unlike embedded SaaS AI or device-based coding agents, these systems are purpose-built for specific enterprise workflows and deeply integrated with the data and infrastructure that powers them.

A homegrown AI system might be a multi-agent research pipeline that queries internal knowledge bases, synthesizes findings, and produces reports delivered to senior stakeholders. It might be an automated customer onboarding agent that orchestrates identity verification, account provisioning, and communication workflows across multiple internal systems. It might be a financial analysis pipeline that ingests transaction data, applies models, and surfaces findings to risk teams. In each case, the system has been built to do something specific and important, with integrations that are unique to the organization.

The platforms in this category include LangChain, CrewAI, LangGraph, Strands Agents, OpenAI Agent SDK, Google ADK, Claude Agent ADK, n8n, Azure Foundry, AWS Bedrock, UiPath, AWS AgentCore, Google Vertex, and the Anthropic API Platform. What they have in common is that they give engineering teams the building blocks to construct agentic behavior, but they don't provide the security controls that govern what that behavior can and can't do at runtime.

The Threat Vectors That Define This Category

Homegrown AI carries a threat model that combines risks from other archetypes with risks that are specific to custom-built agentic systems.

Supply chain compromise through AI SDKs and dependencies

Every homegrown AI pipeline is built on a stack of dependencies: orchestration frameworks, model SDKs, tool libraries, vector database clients, and MCP/A2A components. Each dependency is a potential supply chain attack vector. A malicious package introduced into the dependency chain can compromise the entire pipeline, injecting backdoors into generated outputs, redirecting data to attacker-controlled destinations, or introducing vulnerabilities that persist across every deployment of the system.

According to NIST's AI Risk Management Framework, supply chain risk management for AI systems requires organizations to assess every third-party component in the AI development lifecycle, including pre-trained models, libraries, and tooling. For homegrown AI, this assessment needs to be continuous, not one-time, because dependencies change with every update.

High blast radius from deep enterprise integration

Homegrown AI systems are often built with broad access to enterprise data and systems precisely because that access is what makes them useful. An agentic pipeline with read access to customer data, write access to operational systems, and the ability to trigger external communications can cause significant damage if its behavior is manipulated or drifts from its intended function.

The blast radius of a compromised homegrown AI system scales with its integration depth. A pipeline that touches a single internal database has a limited blast radius. A pipeline that orchestrates multiple specialized agents, each with their own data connections and action capabilities, has a blast radius that can span the organization. Understanding that blast radius by mapping every data source, every tool, and every downstream system the pipeline can reach is a prerequisite for governing it effectively.

Inter-agent communication risks from MCP and A2A protocols

The introduction of Model Context Protocol (MCP) and Agent-to-Agent (A2A) communication protocols has created a new category of risk specific to homegrown AI: inter-agent communication vulnerabilities. When agentic pipelines use MCP to connect to external tool servers or A2A protocols to coordinate between specialized agents, each communication channel becomes a potential attack surface.

A compromised MCP server can inject malicious instructions into an agent's context. An A2A communication channel can be intercepted or spoofed, causing one agent to send instructions to another that weren't authorized by the orchestrating system. Anthropic's guidance on building effective agents notes that multi-agent architectures introduce coordination challenges that go beyond what single-agent systems face, and those coordination challenges include security vulnerabilities that don't exist in simpler architectures.

Prompt injection through retrieval and external inputs

Homegrown AI pipelines frequently use retrieval-augmented generation (RAG) to ground agent responses in enterprise data. Every document, record, or data source that a RAG pipeline retrieves is a potential prompt injection vector. A malicious instruction embedded in a document that enters the retrieval corpus can cause the agent to behave in ways that weren't intended, accessing data outside its normal scope, producing outputs that include sensitive content, or taking actions that the developer never anticipated.

The OWASP Top 10 for LLM Applications identifies prompt injection as the highest-impact risk in LLM applications, and RAG-based homegrown AI systems represent one of the most direct pathways for indirect prompt injection to occur at enterprise scale. Establishing RAG data lineage, understanding what data retrieval pipelines are accessing and surfacing, and validating the integrity of retrieval sources are all foundational controls.

Why Conventional Application Security Doesn't Transfer

Engineering teams building homegrown AI systems often approach security through the lens of conventional application security: secure the code, manage dependencies, control access, validate inputs. These are necessary but not sufficient for agentic systems.

Agents reason and plan. Applications execute instructions. A conventional application does what its code says. An agentic pipeline reasons about its context and decides what to do. That reasoning process can be influenced by injected content, manipulated through carefully constructed inputs, or drift over time as the context the agent operates in changes. Securing the code that builds the agent doesn't secure the decisions the agent makes at runtime.

Pipeline behavior is emergent, not fully specified. A multi-agent pipeline that coordinates specialized agents across a complex workflow can produce behavior that wasn't explicitly programmed and wasn't anticipated during design. The combination of two agents' capabilities, mediated by a context that includes retrieval results, user inputs, and prior agent outputs, can produce outcomes that no single developer fully modeled. Runtime behavioral monitoring is the only control that catches emergent behavior before it becomes an incident.

Development lifecycle gates don't substitute for runtime controls. Pre-deployment red-teaming, automated dynamic application security testing (DAST) against agent pipelines, and pre-commit security validation are all essential, but they validate behavior under test conditions. Production inputs, real user queries, live retrieval results, and actual external system responses introduce conditions that test environments don't fully replicate. A pipeline that passes every pre-deployment security gate can still behave unexpectedly when it encounters inputs that fall outside the test distribution.

Consider this scenario: an engineering team builds a financial analysis pipeline using LangGraph and AWS Bedrock. The pipeline retrieves transaction records, applies a model to identify anomalies, and surfaces findings to a risk team dashboard. A bad actor who gains write access to a data source in the retrieval corpus embeds instructions that cause the pipeline to include a secondary output: a formatted summary of flagged transactions sent to an external webhook. The pipeline passes all pre-deployment security checks because the malicious content wasn't in the test data. The attack succeeds at runtime, after every gate has cleared.

The Controls Homegrown AI Actually Requires

Governing homegrown AI requires controls that span the full development and deployment lifecycle, with particular emphasis on supply chain validation, runtime behavioral enforcement, and continuous posture management.

AI bill of materials and supply chain validation

Every homegrown AI pipeline needs a complete inventory of its components: models, frameworks, libraries, SDKs, MCP servers, and A2A dependencies. This AI bill of materials (AI BOM) is the foundation of supply chain security. AI Security Posture Management capabilities that track all models, tools, SDKs, and dependencies provide the visibility needed to manage known vulnerabilities under defined SLAs and validate every third-party component before integration.

Supply chain validation should include automated scanning of AI SDK supply chains and MCP/A2A components for known vulnerabilities and malicious packages, validation of model and MCP/A2A supply chains before any component is promoted to production, and continuous monitoring for newly discovered vulnerabilities in components that are already in use.

RAG data lineage and retrieval security

For pipelines that use RAG, establishing data lineage means understanding the origin, transformation, and privacy status of every data source that the retrieval pipeline can access. This includes mapping what data retrieval pipelines are accessing and surfacing, validating the integrity of retrieval sources to prevent injection through poisoned content, and securing embeddings to prevent data leakage through retrieval.

Retrieval sources that include external content, user-submitted documents, or data from third-party integrations represent the highest prompt injection risk and require the most rigorous validation. Internal-only retrieval sources with well-controlled write access carry lower risk but still need to be monitored for anomalous content that could indicate a supply chain compromise.

Pre-deployment security testing

Before any homegrown AI pipeline reaches production, it needs to be tested against adversarial inputs. Automated red-teaming (DAST) against agent pipelines, pre-commit security validation and deployment gates in the AI development lifecycle, and supply chain component assessment before integration are all required controls.

Automated DAST for agentic pipelines is meaningfully different from DAST for conventional applications. It needs to test how the pipeline responds to injected instructions, unexpected retrieval results, malformed tool responses, and inputs designed to cause the agent to exceed its intended scope. The MITRE ATLAS framework documents adversarial techniques specific to AI systems and provides a useful taxonomy for structuring red-teaming efforts against homegrown pipelines.

Runtime behavioral monitoring and inline prevention

The most critical control for homegrown AI is runtime behavioral monitoring that catches deviations from expected behavior while the pipeline is running. AI Detection and Response capabilities for homegrown AI need to map agent decision tracing and behavioral profiles to establish runtime baselines, detect data exfiltration through agent outputs, API responses, or inter-agent communication, and identify rogue agent behavior and malicious MCP tool calls in real time.

Inline prevention at the runtime layer means the ability to block individual tool calls, interrupt outputs that violate policy, and terminate pipeline executions when behavior crosses defined boundaries. For high-stakes pipelines with large blast radii, automated response without manual intervention is a requirement: the speed at which agentic pipelines execute means that a human review cycle is too slow to prevent damage once anomalous behavior begins.

Incident response with forensic replay

When something goes wrong in a homegrown AI pipeline, the investigation needs to reconstruct exactly what happened: what inputs the agent received, what context it retrieved, what decisions it made, what actions it took, and what outputs it produced. This requires decision tracing capabilities that preserve the full audit trail of agent execution, not just the final output. Forensic replay, the ability to replay an agent's decision sequence against recorded inputs, is what makes root cause analysis possible and what prevents the same class of incident from recurring.

Building Homegrown AI Security Into the Development Lifecycle

Security for homegrown AI is most effective when it's built into the development process rather than bolted on at deployment. This requires close collaboration between security teams and the engineering teams building these systems.

Security requirements at design time. Before the first line of pipeline code is written, define the security requirements that the system needs to meet: what data sources it can access, what actions it can take, what its blast radius limits are, and what behavioral policies it needs to operate within. These requirements should inform every subsequent design decision.

Continuous posture assessment through the development lifecycle. Every time a new component is added, a dependency is updated, or a new data source is integrated, the pipeline's security posture needs to be reassessed. Static one-time assessments don't account for the rate of change in homegrown AI development.

Runtime monitoring as a first-class engineering requirement. Instrumenting behavioral monitoring into a homegrown AI pipeline is a development task, not an afterthought. Engineering teams that treat runtime visibility as a feature of the pipeline, rather than an external add-on, produce systems that are far easier to secure and govern over time.

The Most Capable Systems Require the Most Complete Governance

Homegrown AI systems represent the leading edge of what enterprise AI can do. They're also the leading edge of what enterprise AI risk looks like. The depth of their integration with enterprise data and infrastructure, the complexity of their multi-agent architectures, and the speed at which they execute make them the archetype where governance gaps carry the most significant consequences.

The enterprises getting this right are building AI security posture management and runtime controls into their homegrown AI programs from the beginning, treating agent security as a core engineering requirement rather than a security team add-on. They know their blast radius. They monitor their pipelines in real time. They can respond at the speed agents act.

The full framework for securing homegrown AI systems and every other AI archetype in your enterprise is documented in The Definitive Guide to AI Security. Download it to understand the complete lifecycle of controls your AI security program needs to build.

FAQs About Homegrown AI Security

What are homegrown AI systems?

Homegrown AI systems are custom agents and agentic pipelines built by internal engineering teams using AI SDKs, orchestration frameworks, and cloud AI APIs. Examples include pipelines built on LangChain, CrewAI, LangGraph, AWS Bedrock, Azure Foundry, Claude Agent ADK, and the Anthropic API Platform. They differ from SaaS-managed AI in that they're purpose-built for specific enterprise workflows and deeply integrated with internal data and infrastructure.

Why do homegrown AI systems carry higher risk than other AI archetypes?

Homegrown AI systems have the deepest integration with enterprise data and infrastructure, the highest capabilities, and the highest blast radius if compromised. Unlike embedded SaaS AI or device-based tools, these systems are designed to take complex autonomous actions across multiple internal systems simultaneously. A compromised or manipulated homegrown pipeline can cause damage that propagates across the organization before any human reviewer detects it.

What is an AI bill of materials (AI BOM) and why does it matter?

An AI bill of materials is a comprehensive inventory of every component in a homegrown AI system: models, frameworks, libraries, SDKs, MCP servers, A2A dependencies, and retrieval data sources. It matters because every component is a potential supply chain attack vector. Without a complete AI BOM, organizations can't assess the full attack surface of their pipelines, manage known vulnerabilities, or validate that every component meets security requirements before deployment.

How does RAG introduce prompt injection risk in homegrown AI?

Retrieval-augmented generation (RAG) pipelines retrieve content from data sources to ground agent responses. Any content in those data sources that includes malicious instructions can be retrieved and processed as a directive by the agent, causing it to behave in ways that weren't intended. This is indirect prompt injection at the retrieval layer. Sources that include external content, user-submitted documents, or data from third-party integrations carry the highest risk. Establishing data lineage and monitoring retrieval source integrity are the primary controls.

What is the difference between pre-deployment testing and runtime monitoring for homegrown AI?

Pre-deployment testing, including automated red-teaming and DAST, validates pipeline behavior under test conditions before production deployment. Runtime monitoring observes what the pipeline actually does during live execution. These are complementary controls, not substitutes for each other. Test environments don't fully replicate production inputs, and pipelines that pass every pre-deployment gate can still behave unexpectedly when they encounter real user inputs, live retrieval results, and actual external system responses.

How does MCP create security risk in homegrown AI pipelines?

Model Context Protocol (MCP) servers extend an agentic pipeline's capabilities by providing tools, data, and services at runtime. Each MCP server connection is a supply chain dependency and a potential attack vector. A compromised or malicious MCP server can inject instructions into the agent's context, redirect behavior toward attacker-controlled outcomes, or exfiltrate data through what appears to be a legitimate tool call. Every MCP server in a homegrown pipeline needs to be inventoried, validated, and continuously monitored.

What does incident response look like for a homegrown AI pipeline?

Effective incident response for homegrown AI requires forensic capture of the full agent execution trace: what inputs the agent received, what content it retrieved, what decisions it made, what actions it took, and what outputs it produced. Decision replay capabilities allow security teams to reconstruct exactly what happened and why. Response actions should include automated isolation of compromised pipeline components, blocking of attacker IP addresses, and initiation of root cause analysis workflows that prevent the same class of incident from recurring.

How should security teams work with engineering teams on homegrown AI?

The most effective approach is to integrate security requirements into the design phase, before development begins. Security teams should define the blast radius limits, acceptable data access scope, and behavioral policy boundaries that the pipeline needs to operate within. Engineering teams should instrument runtime behavioral monitoring as a feature of the pipeline, not an external add-on. Continuous posture assessment should be part of every development cycle, not a one-time gate at deployment.

All Academy Posts

Capabilities

Environment

By Business Need

By Platform

By Risk Type

By Business Type

Homegrown AI: Why the Highest-Capability Systems Carry the Highest Risk