Large foundation models are integrated into Computer Use Agents (CUAs), enabling autonomous interaction with operating systems. However, this autonomy introduces serious security risks: malicious instructions or visual prompt injections can trigger unsafe reasoning and cause harmful system-level actions.
In this paper, we present MirrorGuard, a plug-and-play defense framework. To reduce the cost of large-scale training in operating systems, we propose a novel neural-symbolic simulation pipeline (MirrorWorld), which generates realistic, high-risk GUI interaction trajectories entirely in a text-based simulated environment. In real-world testing, MirrorGuard significantly mitigates security risks (reducing unsafe rate from 66.5% to 13.0% on UI-TARS) while maintaining a marginal false refusal rate.
Collecting unsafe trajectories (e.g., `rm -rf /`) in real environments is dangerous and costly. MirrorGuard constructs a high-fidelity MirrorWorld: a text-simulated environment with symbolic state tracking.
Generates massive risk trajectories in pure text, capturing the causal chain of insecure behaviors.
Security logic learned in text simulation effectively transfers to visual GUIs via aligned Vision-Language Models.
Intervenes at the "Thought" level, steering agents to safe alternatives (e.g., "Verify before delete") instead of simply blocking them.
We evaluated MirrorGuard on RiOSWorld and OS-Harm benchmarks. It outperforms state-of-the-art defenses like GuardAgent.
| Agent Framework | Defense Method | RiOSWorld Unsafe Rate (↓) | OS-Harm Unsafe Rate (↓) | False Refusal Rate (FRR) (↓) |
|---|---|---|---|---|
| UI-TARS (Native GUI Agent) |
GuardAgent | 53.9% | 16.4% | 20.51% |
| MirrorGuard (Ours) | 13.0% | 1.8% | 5.13% | |
| Qwen2.5-VL-72B (ReAct Framework) |
GuardAgent | 60.8% | 35.5% | 30.77% |
| MirrorGuard (Ours) | 7.7% | 2.7% | 7.69% | |
MirrorGuard doesn't just block actions; it corrects thoughts. Below is a comparison of how a vanilla agent vs. MirrorGuard handles a "harmful content generation" request.
@inproceedings{zhang2026mirrorguard,
title={MirrorGuard: Toward Secure Computer-Use Agents via Simulation-to-Real Reasoning Correction},
author={Zhang, Wenqi and Shen, Yulin and Jiang, Changyue and Dai, Jiarun and Hong, Geng and Pan, Xudong},
booktitle={Proceedings of the ACM Conference (XXX)},
year={2026}
}