AI Security Is Becoming a Systems Problem
*A briefing on four of the best and recent AI Security research papers from 2025*
Why This Topic Matters Right Now
Enterprise interest in AI agents has shifted quickly—from experimentation to early deployment.
Teams are no longer just asking whether large language models can write code or answer questions. They are exploring agents that can plan, act, use tools, persist memory, and coordinate across workflows. In other words: systems that do *work*, not just generate text.
This shift changes the risk profile fundamentally.
Traditional model-centric safety controls—prompt filters, content moderation, offline evaluation—were not designed for this.
The research covered here looks squarely at that gap. Rather than arguing for slower adoption, these papers ask a more practical question: **what needs to change in how we design, govern, and secure AI systems as they become more autonomous?**
---
The State of the Art — and Its Limits
Today’s enterprise AI deployments often rely on assumptions that no longer hold:
- Model-level safeguards are treated as sufficient, even when systems are embedded in complex workflows.
- Human-in-the-loop is assumed to mean safety, without examining how infrequent or superficial that oversight may be.
- Tool access is granted broadly, with limited auditability or policy enforcement once an agent is running.
- Governance frameworks focus on individual models, not multi-agent systems that evolve over time.
These approaches work for copilots and chat interfaces. They strain under agentic architectures—especially when agents operate continuously, interact with external systems, or coordinate with each other.
The four papers below each examine a different failure mode of this transition.
---
Paper 1:
Disrupting the First Reported AI-Orchestrated Cyber Espionage Campaign (Anthropic)
What Problem This Paper Tackles
This report documents a real-world cyber espionage campaign in which an AI system was used not as an advisor, but as an execution engine. The threat actor orchestrated multiple AI-driven agents to perform reconnaissance, vulnerability discovery, exploitation, lateral movement, and data exfiltration.
The core concern is no longer hypothetical misuse—it is **operational autonomy at scale**.
Core Idea
Anthropic shows that a state-sponsored actor was able to use Claude, combined with Model Context Protocol (MCP) tools, to automate 80–90% of tactical cyber operations. Humans remained involved only at high-level decision points.
Crucially, the system did not rely on novel exploits. It combined commodity security tools with AI-driven orchestration, task decomposition, and persistence across sessions.
Why This Is Meaningfully Different
This case demonstrates sustained, multi-phase attacks executed largely without human intervention.
At the same time, the report is nuanced: the AI frequently hallucinated findings and required validation. Full autonomy remains brittle. But the *direction* is clear.
Practical Implications
For enterprise leaders, the lesson is not that AI is uncontrollable—but that **control must move up the stack**. Safeguards cannot live solely inside the model. They must govern how models access tools, maintain state, and escalate actions.
---
Paper 2:
Security Challenges of AI Agent Systems
What Problem This Paper Tackles
As AI agents gain autonomy, they introduce new attack surfaces that are poorly covered by traditional security models. These include prompt injection, tool misuse, memory poisoning, and emergent behavior across agent interactions.
The paper asks a foundational question: *what does “secure by design” mean for agentic systems?*
Core Idea
The authors frame AI agents as distributed software systems rather than enhanced models. Security failures arise not just from bad prompts, but from how agents perceive context, store memory, call tools, and coordinate actions.
They propose analyzing agent systems across the full lifecycle: perception, planning, execution, learning, and interaction.
Why This Is Meaningfully Different
Most AI security discussions remain model-centric. This paper reframes the problem as **systems security**, borrowing concepts from distributed systems, operating systems, and software supply chains.
The risk is not a single exploit—it is composition.
Practical Implications
Security reviews and Threat modeling for AI Agents must include tool chains, memory stores, feedback loops, and cross-agent coordination.
---
Paper 3:
Model Context Protocol (MCP) and Its Security Implications
What Problem This Paper Tackles
MCP standardizes how AI systems connect to external tools, data sources, and execution environments. While this enables powerful integrations, it also creates a uniform interface for misuse.
The question is not whether MCP is unsafe, but **how it should be governed**.
Core Idea
The paper highlights that while any given MCP tool call may appear benign in isolation, but it might be dangerous in aggregate.
Security therefore depends on context-aware authorization, logging, and policy enforcement across sessions.
Why This Is Meaningfully Different
What’s different is where attacks live and how they propagate. The paper demonstrates MCP-specific failure modes—including poisoned tool descriptions, file-based payloads, and chain attacks that spread through shared context as agents invoke multiple tools in sequence. These are not edge cases; they arise naturally from how MCP is designed to compose tools.
Practical Implications
Enterprises using MCP must treat tools and their metadata as untrusted inputs, not as safe extensions of the model.
Concretely, this means:
Scoping and isolating tools rather than sharing broad registries across agents
Separating data from instructions so files and tool descriptions cannot silently steer behavior
Regularly red-teaming and regression-testing agent workflows as tools change
---
Paper 4:
Trust, Risk, and Security Management (TRiSM) for Agentic AI
What Problem This Paper Tackles
Existing AI governance frameworks focus on fairness, bias, and explainability at the model level. They struggle to extend to multi-agent systems with memory, planning, and tool use.
This paper proposes a broader governance lens.
Core Idea
The authors introduce a TRiSM framework tailored to agentic AI. It integrates explainability, ModelOps, security, privacy, and lifecycle governance, mapped explicitly to agent workflows and failure modes.
They also propose new evaluation metrics to measure inter-agent coordination and tool-use efficacy.
Why This Is Meaningfully Different
Rather than adding controls ad hoc, TRiSM treats trust and risk as system properties. Governance is continuous, not a pre-deployment checklist.
Practical Implications
For regulated or high-stakes domains, agentic AI will require explicit governance architectures. TRiSM offers a starting point for designing those systems without freezing innovation.
---
How These Papers Relate
These papers approach different layers—real-world misuse, system security, tool protocols, and governance—but they converge on a shared insight:
- Agentic AI systems can weaponise both content and behavior
- Security failures emerge from composition, not single components
- Governance must be designed into the system, not bolted on
None of the papers argue against deploying AI agents. They argue for treating them as first-class software systems.
---
What This Unlocks Over Time
If these ideas are taken seriously, we should expect:
- Clearer architectural patterns for safe agent deployment
- Tool ecosystems designed for least privilege and auditability
- Governance frameworks that scale with autonomy
- Faster, enterprise adoption—not slower will help us build a safer and innovative future
---
## References
- Anthropic. *Disrupting the First Reported AI-Orchestrated Cyber Espionage Campaign*, 2025
- *Security Challenges of AI Agent Systems*, 2024.
- *Model Context Protocol (MCP) and Security Implications*, 2024.
- *Trust, Risk, and Security Management (TRiSM) for Agentic AI*, 2025.





