// Guides

Security: What Claude Can and Cannot Be Trusted With

Advanced: Security: What Claude Can and Cannot Be Trusted With

12 April 2026 developer-tools engineering security

Security: What Claude Can and Cannot Be Trusted With

Series: Claude Learning Journey · Advanced Usage

Security with Claude is not about the model being malicious. It is about the model having access to things you did not intend to share, and not always understanding the consequences of its actions. The discipline is not different from security with any other tool — assume the tool will do exactly what you tell it to, including the things you did not mean.

This post is about the practical security boundaries when using Claude in a development workflow.

What Claude Can See

Claude can see what you give it. Files you share, code you paste, API keys in your environment variables if it has access to read them. The security model is simple: do not give Claude access to things you do not want it to know.

In practice this means:

Do not paste secrets, API keys, or credentials into the conversation
Be careful about which files you give Claude read access to
Treat Claude’s outputs as potentially visible — do not write secrets into code, even temporarily

What Claude Can Do

Claude can write files, run commands, and make API calls if it has the tools for it. Each capability is a potential attack surface.

Command execution is the highest risk: a prompt injection attack — where someone tricks Claude into running a command it should not — is theoretically possible. The mitigation is the same as any other command execution: least privilege, no running as root, no executing untrusted input.

File writing is lower risk but still worth thinking about: Claude writing to the wrong location, overwriting something important, or writing files with permissions that expose something sensitive.

Prompt Injection: The Real Risk

Prompt injection is the risk that someone outside your organisation tricks your Claude deployment into doing something it should not. This is a real risk if you use Claude to process external inputs — emails, documents, user-generated content.

The mitigation: treat Claude like a junior developer who will follow instructions exactly, including instructions they receive from external sources. Sanitise inputs. Do not give Claude commands embedded in external content the ability to execute.

The Principle of Least Trust

Apply the same principle to Claude that you apply to any other piece of infrastructure: only give it the access it needs, only share the context it needs, and verify what it produces before using it.

Claude is a tool. Like any tool, it is most secure when it is precise about what it is given and what it does.

What You’ll Learn

The access model: what Claude can and cannot see
Command execution and file writing as attack surfaces
Prompt injection and how to think about it
Least-trust principles for Claude workflows

Try It Yourself

Review your current Claude usage against the principle of least trust. What files does Claude have access to that it does not need? What could it write that it should not? Are there any external inputs that reach Claude without sanitisation? Close the gaps you find.

What’s Next

Security failures often show up as errors — unexpected behaviour, wrong outputs, things that break. The next post is about error handling: how to use Claude to debug and handle errors gracefully.

Part of the Claude Learning Journey series · Next: Error Handling: Debugging and Graceful Failures

// Share this post

X / Twitter LinkedIn Bluesky Facebook Threads Reddit

← Back to blog