2026-02-24

When AI Agents Go Wrong: OpenClaw Security Incident Roundup

This post is currently available in Chinese only. Read in Chinese →

Three high-profile OpenClaw security incidents from early 2026 — and the lessons they teach about running autonomous agents safely.

The Incidents

Classified report published to the public web. A cybersecurity company's OpenClaw agent published internal intelligence reports to a public website. The agent wasn't hacked — it just didn't know the data was classified. No one told it which sources were internal-only.

Bulk email deletion that couldn't be stopped. A Meta alignment researcher lost 200+ emails when her agent ignored her "don't execute, wait for confirmation" instruction after context compression dropped it. Sending "STOP" in chat didn't work — she had to manually kill the process.

Supply chain attack via npm. A popular AI coding tool's 2.3.0 release was poisoned with a postinstall script that silently installed OpenClaw on ~4,000 machines over 8 hours.

Key Lessons

Agents don't understand "confidential." You set the boundary, or there is none.
Context compression can drop safety instructions. Critical constraints belong in AGENTS.md or permission config, not just in chat.
Sending "Stop" in chat won't interrupt a running task — it queues behind the current execution.
Run agents on isolated machines, not your main workstation.
Don't expose OpenClaw with default config on the public internet.
Audit plugins before installing — 20% of ClawHub plugins were found malicious in one audit.

Share:Twitter Telegram