OpenClaw Security Risks — And How ClawPatrol Fixes Them

OpenClaw agents are powerful. They call tools, read and write files, send messages, and act autonomously on your behalf. That autonomy is the whole point. It's also the reason attackers are paying attention.

‍

Most teams building on OpenClaw treat security as an afterthought. A few instructions in the system prompt. Maybe a content filter. And then they ship. The problem is that approach assumes the LLM is always in control and always trustworthy. In a world of prompt injection, supply chain attacks, and out-of-band file tampering, it isn't.

‍

Here's what the actual threat surface looks like, and why it's bigger than most people realize.

‍

The Attack Surfaces No One Talks About

‍

Prompt injection is the most obvious one. An attacker embeds hidden instructions in user input or external content the agent reads, and the agent follows them. It doesn't know any better. If your only defense is a prompt that says "ignore malicious instructions," a sufficiently crafted injection overrides that too.

‍

Workspace file tampering is subtler and more dangerous. Files like SOUL.md, AGENTS.md, and IDENTITY.md define who your agent is and how it behaves. They sit on disk. If an attacker writes to them directly, not through a tool call but a plain file write, your agent's entire identity can be replaced overnight. No tool call to intercept. No log entry. Just a changed file and an agent that now follows different instructions.

‍

Malicious skills are the supply chain problem. OpenClaw's skill ecosystem is growing fast. A skill that looks legitimate at install time might phone home, exfiltrate data, or inject hidden instructions. Once installed, it runs with the same trust as everything else.

‍

The common thread across all three? If your security model lives inside the LLM, it can be bypassed by controlling what the LLM sees.

‍

Why Most Approaches Fall Short

‍

The tools available today each solve a piece of the puzzle but leave significant gaps.

‍

Some tools catch suspicious tool calls but have no visibility into out-of-band file changes. An attacker who writes directly to SOUL.md without touching a tool is completely invisible to them. Some route violations to Splunk or external dashboards, which means the agent never tells the user what happened. Others rely on the LLM itself to enforce security rules, which means a prompt injection that takes over the agent also disables the defenses for that turn. And some enforce network and filesystem boundaries but are content-blind. A prompt injection embedded inside an allowed file? They cannot see it at all.

‍

None of them cover all three attack surfaces together. And none of them can guarantee the LLM isn't involved in the enforcement path.

‍

Introducing ClawPatrol

ClawPatrol is a security plugin for OpenClaw that enforces protection at the gateway level, in code, before the LLM ever gets involved.

‍

The core insight is simple: security that depends on the LLM can be bypassed by compromising the LLM. Security that runs as gateway code cannot.

‍

ClawPatrol runs three independent enforcement layers simultaneously.

‍

Layer 1: Gateway hooks. Six lifecycle hooks intercept prompts, tool calls, LLM outputs, and channel messages before and after they happen. When before_tool_call detects a violation, it returns { block: true }. The tool never runs. The LLM has no say. The same applies to outbound messages via message_sending. These are hard blocks, not suggestions.

‍

Layer 2: File integrity monitoring. ClawPatrol computes SHA-256 baselines for your agent's cognitive files on startup, then re-checks every 60 seconds. When a hash changes, it sends the content to Enkrypt AI Guardrails to classify the change as malicious or benign. Legitimate edits update the baseline silently. Attacks get flagged, an alert is queued, and the baseline is preserved so the agent keeps warning the user on every turn until the file is restored. No manual review required.

‍

Layer 3: Autonomous skill scanning. Every skill you install into OpenClaw is automatically scanned in the background by Skill Sentinel, a multi-agent AI analysis pipeline. The verdict is SAFE, SUSPICIOUS, or MALICIOUS, with specific findings, evidence, and remediation steps. A MALICIOUS verdict is not a one-time alert. It persists in memory, injected into every agent turn across sessions, until the skill is removed or re-scanned clean.

‍

All three layers run simultaneously. None of them require the LLM to cooperate.

‍

What the User Actually Sees

‍

When ClawPatrol catches something, it doesn't silently log it. It injects the alert directly into the agent's context via prependContext so the agent tells the user in plain language what was detected, what the attacker was trying to do, and what to fix. No dashboard. No Splunk query. The conversation is the alert.

‍

A workspace file tampering alert looks like this:

ClawPatrol caught a prompt injection embedded in your SOUL.md. The injected content attempted to override safety guidelines and exfiltrate your SSH private key. The instructions were not followed. Here is what you should do to clean up.

‍

That level of explanation, with confidence scores and specific policy clauses, means the user knows exactly what happened and can act on it immediately.

‍

Getting Started

‍

ClawPatrol ships as an OpenClaw plugin and runs on macOS, Windows, and Linux. Setup takes three commands.

npm install -g @enkryptai/clawpatrol@latest
clawpatrol-setup

‍

The setup wizard walks through API key configuration, hook selection, and optional OpenTelemetry export for teams who want per-hook traces and metrics in their existing observability stack.

‍

The Bottom Line

‍

Agentic AI is not a future problem. OpenClaw agents are running in production today, calling tools, touching files, sending messages on behalf of real users. And the security models most teams rely on were designed for a world where the LLM is the only actor. That world doesn't exist anymore.

‍

Attackers do not need to break into your infrastructure. They just need to control what your agent reads. A crafted user message. A poisoned file. A skill with a hidden payload. Any of those is enough to redirect an agent that has no enforcement layer underneath it.

‍

ClawPatrol exists because telling an AI to behave is not a security model. Hard enforcement at the gateway, continuous monitoring of the files that define your agent, and autonomous scanning of every skill you install: these are the controls that actually hold when things go wrong.

‍

Your agent is only as trustworthy as the layer beneath it. Make sure that layer cannot be bypassed.

‍

Secure your OpenClaw agent today.

‍

Join 2,000+ readers

Your OpenClaw Agent Is More Exposed Than You Think

The Attack Surfaces No One Talks About

Why Most Approaches Fall Short

Introducing ClawPatrol

What the User Actually Sees

Getting Started

The Bottom Line

Frequently Asked Questions

More articles

Harvest Now, Decrypt Later: Why AI Agents Are the Threat No One's Watching

We Built Two AI Security Games. Play Them to Understand How Attacks Actually Work.

Securing AI at Scale in APAC: Why Kode-1 and Enkrypt AI Are Building This Together

PRODUCTS

SOLUTIONS

BY USE CASE

Helpful links

COMPANY