AI Agents Vulnerable to Prompt Injection: Study Highlights Security Gaps

Recent research indicates that current AI web agents exhibit significant vulnerabilities to prompt injection attacks, with no leading systems—namely those powered by GPT‑5 and Gemini—demonstrating reliable defense mechanisms. This insight stems from the StakeBench benchmark, developed by a consortium of researchers from Nanyang Technological University, ST Engineering, IBM Research, and the University of Illinois Urbana-Champaign, aimed at evaluating the efficacy of AI agents in realistic web scenarios.

In their analysis, the researchers conducted 3,168 adversarial runs across two platforms, NanoBrowser and BrowserUse, employing 264 benchmark cases. They found that indirect prompt injection attacks, where harmful commands are concealed within regular web content like product reviews, resulted in success rates ranging from 41.67% to 68.16%. Direct prompt injection attacks fared even worse, exceeding a troubling 79% across all configurations tested.

According to the researchers, these vulnerabilities manifest distinct patterns when assessed through a stakeholder lens. "Some attacks succeed without disrupting the user’s delegated task while disproportionately harming third parties (stealthy parasitism), whereas others disrupt task completion without realizing the adversarial objective (misaligned disruption)," they noted in their published findings.

All Attack Objectives Reveal Failure Modes

The researchers categorized the performance of web agents into four possible outcomes: Robust Behavior, Stealthy Parasitism, Misaligned Disruption, and Compounded Failure. Ideally, Robust Behavior would mean that an agent completes a user’s task without inadvertently facilitating an attacker’s objectives. However, the absence of any successful configurations achieving Robust Behavior signifies a deeper issue beyond mere high attack success rates.

"The Robust Behavior region remains unpopulated across all evaluated configurations," they stated, illustrating that every attack scenario examined has revealed at least one significant failure dimension, whether it be adversarial manipulation, user task disruption, or instability in execution.

Successful Attacks Maintain Appearances

One notable failure mode identified is termed "stealthy parasitism." This situation arises when an AI agent fulfills the user’s task but simultaneously serves an attacker’s interest. The research cites an online shopping example: if a nefarious prompt is embedded into a product review, it could skew the agent's recommendations towards a specific item. The end user might receive a seemingly adequate recommendation, yet the integrity of competing sellers is compromised.

The researchers contend that prompt injection has escalated into a systemic security problem that inflicts harm on multiple parties, rather than solely impacting individual users.

Stakeholders Encounter Varied Risks

In contrast to previous benchmarks concentrating primarily on attack success rates, StakeBench assesses harm across three primary stakeholder groups: end users, third-party sellers, and the platforms hosting these agents. The analysis reveals that each group faces notably different risks.

Notably, seller-targeted assaults yielded the highest success rates across both web agents evaluated. Conversely, user-targeted attacks exhibited the lowest rates of task deviation, suggesting that these adversarial strategies may evade detection, as user workflows often appear unaffected even when the attacker’s objectives are achieved.

"The same agent can simultaneously appear stealthy on user-targeted attacks, susceptible on seller-targeted attacks, and unstable on platform-targeted attacks," the study explains, highlighting that merely measuring aggregate Attack Success Rate (ASR) fails to adequately capture stakeholder-specific vulnerabilities.

Model Varieties Affect Security Outcomes

Significant differences were also observed between various AI models and agent architectures. The researchers noted that substituting GPT-5 with Gemini-2.5-Flash raised indirect prompt injection success rates by 26.49% on NanoBrowser and 6.2% on BrowserUse. They further observed that BrowserUse consistently demonstrated greater task deviation and behavioral irregularities compared to NanoBrowser.

This research suggests that the resilience against prompt injection isn't simply a matter of the language model employed but also hinges on the agent's implementation context. "Prompt-injection security in deployable web agents isn't a scalar property, but a distribution of harm dictated by stakeholder influence, the alignment of injected objectives with user tasks, and the architectural aspects of the agent," the authors explained.

Visual Content: The Next Vulnerability?

The researchers speculated on the potential for prompt injection attacks to extend beyond text. In preliminary tests, they manipulated only product images while keeping text, ratings, and page formatting unchanged. The results were startling, showing an increase in selection rates for the modified product from 10% to a staggering 76.67% in the absence of textual ratings, indicating that visual aspects could profoundly affect AI decision-making.

Although these preliminary findings were limited, they underscore the need to consider visual content as a new potential attack vector as organizations increasingly rely on autonomous AI systems.