Part III: Understanding Emerging AI Security Threats — From Prompt Injection to AI Worms

Would you like to be featured in our newsletter🔥 and get noticed, QUICKLY🚀? Simply reply to this email or send an email to editor@aibuzz.news, and we can take it from there.

May 08, 2026

Part III of a three-part series on AI agent security fundamentals.

Summary

Security researchers have already demonstrated self-replicating AI worms, persistent compromise mechanisms, and multi-stage attack chains against AI agents. These vulnerabilities have been tested against a range of AI systems, including Claude Code, ChatGPT and Gemini. Understanding these emerging threats is essential for anyone using, building or deploying agentic AI systems.

Want to go deeper? Subscribe to the Agentic AI Briefing, a weekly publication covering emerging trends, threats, defensive strategies, and real-world lessons from the frontier. Sign up here: https://aisecurityguard.io/agentic-ai-briefing

Subscribe to the Agentic AI Briefing

Beyond Simple Prompt Injection

Many people may have heard of prompt injection. These are attacks that trick an AI into ignoring its instructions or executing malicious ones. Protecting against prompt injection is important, but there is a larger threat landscape facing agentic systems.

Researchers Ben Nassi, Bruce Schneier, and Oleg Brodt recently mapped real-world AI security incidents into a framework they call the Promptware Kill Chain. It reveals that prompt injection is just stage one of a seven-stage attack mechanism.

The full chain:

• Stage 1: Initial Access: Malicious prompt enters context (direct or via documents)

• Stage 2: Privilege Escalation: Attacker bypasses safety guardrails (jailbreaking)

• Stage 3: Reconnaissance: Agent reveals its tools, permissions, and connected services

• Stage 4: Persistence: Attack embeds in memory or poisons retrieval systems

• Stage 5: Command & Control: Agent fetches updated instructions from attacker

• Stage 6: Lateral Movement: Attack spreads to other users, agents, or systems

• Stage 7: Actions on Objective: Data theft, fraud, or physical-world impact

The key insight: each stage enables the next. An attacker that achieves only initial access has limited impact. One that is able to establish a persistent connection to the agent’s device, and can command the agent to engage in tasks has access to many system resources, depending on the agent’s permissions.

Subscribe to the Agentic AI Briefing

Demonstrated Attacks

Researchers have demonstrated attacks against production systems that follow all or parts of the kill chain.

Morris II: The First AI Worm (2024)

Researchers created a self-replicating worm targeting email assistants using internal databases to augment their capabilities:

1. Attacker sends email containing adversarial prompt

2. Email gets stored in the agent’s retrieval database

3. When the victim asks about emails, the malicious prompt gets retrieved

4. The prompt jailbreaks the agent and exfiltrates data from other emails

5. The agent automatically replies to contacts, spreading the payload

6. Zero user interaction required after the initial email

Subscribe to the Agentic AI Briefing

“Invitation Is All You Need” (2025)

Researchers demonstrated attacks through calendar invites:

1. Attacker sends calendar invitation with embedded prompt injection

2. User asks their AI assistant “What’s on my calendar today?”

3. The prompt injection activates and compromises the assistant

4. The attack persists in the user’s workspace memory

5. Researchers demonstrated: location identification and video recording

Google acknowledged and patched this vulnerability after disclosure.

Why Sandboxing Alone Isn’t Enough

One way to reduce agent exposure to attacks is to sandbox it. However, in some cases this can be insufficient.

What Sandboxing Prevents:

• Agent exceeds filesystem permissions

• Agent executes arbitrary code

• Agent accesses forbidden network resources

What It Doesn’t Prevent:

• Agent is allowed to read files, is tricked into reading and transmitting details from sensitive ones

• Agent uses permitted tools in unintended ways

• Agent sends data through permitted channels (email, API calls)

The attack surface is the agent’s legitimate capabilities. If an agent can send emails and read documents, an attacker can trick it into emailing sensitive information. The sandbox sees only permitted actions. The intent is what’s malicious.

Subscribe to the Agentic AI Briefing

Why Pattern Matching Leaves Gaps

Another defensive strategy is creating a set of rules limiting what an agent can do, such as accessing certain IP addresses or file systems.

But there is a limitation to this approach. According to the authors of the Promptware Kill Chain research:

“Guardrails operate at the application layer, not the architectural layer. They function as pattern-matching defenses against known attack signatures rather than as enforcement of a fundamental boundary between instructions and data.”

An attack might bypass defenses because no signature or detection rule exists yet.

The asymmetry problem:

• Defenders must anticipate and block all possible injection techniques

• Attackers need to discover just one that works

Pattern matching raises the bar. But this method cannot detect every type of attack.

Supply Chain Risks

Beyond the kill chain, there’s a parallel threat: supply chain poisoning.

AI agents rely on external packages, plugins, and tool definitions. Malicious code in a dependency can inject attack capabilities before any prompt injection occurs.

Recent examples include typosquatted npm packages that targeted AI development tools and compromised MCP (Model Context Protocol) server definitions that included hidden instructions.

The Defense-in-Depth Imperative

The Promptware Kill Chain researchers’ conclusion:

“Assuming initial access will occur, practitioners must focus on limiting privilege escalation, preventing persistence, constraining lateral movement, and minimizing the impact of actions on the objective.”

This represents a fundamental shift. Instead of “prevent all prompt injection” (impossible), the goal becomes limiting damage at each subsequent stage.

Effective defense requires multiple controls:

• Input scanning to detect injection patterns before they reach the agent

• Output monitoring to flag data exfiltration attempts

• Memory integrity checks to detect poisoned context

• Network monitoring to block callbacks to attacker-controlled URLs

• Tool integrity verification to ensure plugins haven’t been compromised

• Behavioral anomaly detection to flag unusual action sequences

No single control covers all stages. The attacker needs one path through. The defender needs coverage across all of them.

What’s Next?

This three-part series covered the fundamentals: what AI agents are, the shadow AI problem already affecting most organizations, and the emerging threat landscape that security researchers are documenting.

But understanding threats is just the beginning. The next step is taking action.

The AI Security Action Pack provides 15 practical guides—each with concrete mitigation steps and installable skills your agents can use to protect themselves. It’s designed for humans who need to understand the risks and agents that need to operationalize the defenses.

Download the AI Security Action Pack: https://aisecurityguard.io/action-pack

This is the third installment of a three-part series on AI agent security fundamentals.

• Part I: What AI Agents Actually Are (And Why Security Teams Should Care)

• Part II: Shadow AI — The Security Risk Hiding in Plain Sight

• Part III: Understanding Emerging AI Security Threats — From Prompt Injection to AI Worms

This is the third installment of a three-part series on AI agent security fundamentals developed by Fard Johnmar, international best-selling technology and innovation author, builder and founder of Enspektos, LLC, a firm developing products and services for the autonomous AI future.

AI Buzz!

Discussion about this post

Ready for more?