Security: What Claude Can and Cannot Be Trusted With
Advanced: Security: What Claude Can and Cannot Be Trusted With
Security: What Claude Can and Cannot Be Trusted With
Series: Claude Learning Journey · Advanced Usage
Security with Claude is not about the model being malicious. It is about the model having access to things you did not intend to share, and not always understanding the consequences of its actions. The discipline is not different from security with any other tool — assume the tool will do exactly what you tell it to, including the things you did not mean.
This post is about the practical security boundaries when using Claude in a development workflow.
What Claude Can See
Claude can see what you give it. Files you share, code you paste, API keys in your environment variables if it has access to read them. The security model is simple: do not give Claude access to things you do not want it to know.
In practice this means:
- Do not paste secrets, API keys, or credentials into the conversation
- Be careful about which files you give Claude read access to
- Treat Claude’s outputs as potentially visible — do not write secrets into code, even temporarily
What Claude Can Do
Claude can write files, run commands, and make API calls if it has the tools for it. Each capability is a potential attack surface.
Command execution is the highest risk: a prompt injection attack — where someone tricks Claude into running a command it should not — is theoretically possible. The mitigation is the same as any other command execution: least privilege, no running as root, no executing untrusted input.
File writing is lower risk but still worth thinking about: Claude writing to the wrong location, overwriting something important, or writing files with permissions that expose something sensitive.
Prompt Injection: The Real Risk
Prompt injection is the risk that someone outside your organisation tricks your Claude deployment into doing something it should not. This is a real risk if you use Claude to process external inputs — emails, documents, user-generated content.
The mitigation: treat Claude like a junior developer who will follow instructions exactly, including instructions they receive from external sources. Sanitise inputs. Do not give Claude commands embedded in external content the ability to execute.
The Principle of Least Trust
Apply the same principle to Claude that you apply to any other piece of infrastructure: only give it the access it needs, only share the context it needs, and verify what it produces before using it.
Claude is a tool. Like any tool, it is most secure when it is precise about what it is given and what it does.
What You’ll Learn
- The access model: what Claude can and cannot see
- Command execution and file writing as attack surfaces
- Prompt injection and how to think about it
- Least-trust principles for Claude workflows
Try It Yourself
Review your current Claude usage against the principle of least trust. What files does Claude have access to that it does not need? What could it write that it should not? Are there any external inputs that reach Claude without sanitisation? Close the gaps you find.
What’s Next
Security failures often show up as errors — unexpected behaviour, wrong outputs, things that break. The next post is about error handling: how to use Claude to debug and handle errors gracefully.
Part of the Claude Learning Journey series · Next: Error Handling: Debugging and Graceful Failures