I audited three of my own MCPs back-to-back. Here's what changed.

2026-05-06 · ~~14 min read · security · audit · go · mcp

I built three Go MCP servers over the last few weeks: homelab-status-mcp, ollama-mcp, and architect-mcp. They got real use almost immediately, mostly by me. After a couple weeks of that I sat down and audited them — not because anything had gone wrong, but because they'd quietly become infrastructure I depended on, and "I trust this code because I wrote it" stops being a defense the moment the code touches the internet.

What I expected to find: maybe one or two issues per repo, the kind of stuff a rigorous code review would catch.

What I actually found: enough that the same hardening patterns showed up in all three. By the third audit I was just running a checklist against the new code. The patterns are worth writing down because they're portable to any Go HTTP-client codebase, and the discipline of how I tested them turned out to be a more useful lesson than the patterns themselves.

#Ten patterns I now ship by default

The first audit (homelab-status-mcp) found about a dozen real issues — mostly in HTTP-handling code I'd written too quickly. By the second (ollama-mcp), I was finding the same shapes of bug. By the third (architect-mcp), I was finding them before the audit even started, while writing the original code, because the pattern was already in my head.

The patterns are unglamorous. They show up everywhere because the language gives you sharp edges by default and you only round them off if you remember to.

1. Cap every io.ReadAll(resp.Body). A misconfigured or compromised endpoint can return a 10 GB body, and Go's io.ReadAll will happily allocate the entire thing. The fix is io.LimitReader plus an explicit overflow check — read one byte past your cap, and if you got that extra byte, return an error instead of letting the body through. 50 MB is a reasonable cap for almost everything; anything genuinely larger should be streamed.

2. Truncate response bodies before splicing them into error messages. If you do fmt.Errorf("parse failed: %w (body: %s)", err, string(body)) and body is 10 MB of HTML, your error string is 10 MB. That ends up in audit logs, in the MCP transcript Claude sees, in stderr if you crash. 512 bytes plus a ...[truncated] sentinel is enough to diagnose what went wrong without flooding everything downstream.

3. Use ctx deadlines, not http.Client.Timeout, for per-call budgets. The Timeout field on http.Client applies to every operation through that client — dial plus write plus read combined. So if you set it to 30 minutes (because pull_model legitimately takes that long), every list_models call also gets a 30-minute leash. The fix is context.WithTimeout per call, with a sensible fallback when the caller's ctx has no deadline. The caller's existing deadline is always preserved.

4. Refuse to follow HTTP redirects by default. Most upstream APIs don't redirect. If yours does, you can opt in by changing CheckRedirect. But the default http.Client will happily follow redirects to any host — and a hijacked DNS path or a typo in config could land your authenticated request at an internal service. CheckRedirect: ErrUseLastResponse is one line. Add it to every client.

5. Validate BaseURL at construction, not at first request. Reject empty, non-http(s), missing-host. A typo in config that produces BaseURL = "" or BaseURL = "ftp://..." should fail at startup with a clear error, not at the first call with a stack trace.

6. Never put credentials in URL query parameters. Synology's auth endpoint accepts passwd= as a query string parameter; the original API client used it that way. The problem isn't transit (HTTPS protects the wire) — it's that Go's *url.Error from http.Client.Do includes the full request URL when wrapped, and a transient dial failure mid-auth dumps the password into your error logs. Use POST with form bodies for anything that carries a credential.

7. Allowlist file writes through a base directory. Any tool that takes a user-supplied output path needs a resolveUnderRoot containment check. filepath.Clean strips .. but does not restrict the result to a base dir — filepath.Rel plus a "must not start with .." check does. Without this, a tool with output_path is a "write any file the MCP process can write" primitive, including ~/.ssh/authorized_keys and ~/.bashrc.

8. Magic-byte gates for any user-supplied file passed to an LLM. A vision tool that accepts arbitrary image_path and base64-encodes the bytes for upstream is an arbitrary local-file-read. The model can't read SSH keys as images, but the bytes still leave your machine and end up in the upstream service's logs. A 12-byte signature check (PNG, JPEG, WebP, GIF) closes that path entirely without losing legitimate use.

9. SSRF gates for any user-supplied URL passed to a browser. Headless Chrome resolves hostnames against the host's resolver. A URL like http://vault.internal/ or http://169.254.169.254/ will resolve and get fetched, with embedded JS executed against the LAN. Resolve the hostname yourself, reject any result that's loopback, RFC 1918, link-local, IPv6 ULA, multicast, unspecified, or the AWS metadata address. Document the DNS-rebinding TOCTOU as a known limitation — robustly closing it requires a Chrome-level proxy.

10. Typed errors for distinguishable failure modes. When runOnHost could be returning either "remote command exited non-zero" or "SSH transport failed," callers need to tell those apart. A typed *ExitError with errors.As lets reboot_required silently treat exit-code 1 as "file missing" while still surfacing dial errors. The alternative — string-matching on the error text — is brittle and will eventually mistake a transport error for an exit error in some edge case.

#Test the contract, not the behavior

The pattern I most wanted to write down isn't on the list above. It's about how I tested these fixes.

The naive way to test "we panic with a useful message on nil schema" is to assert the panic happens. The existing panics() helper in modelcontextprotocol/go-sdk returns a bool. Pre-fix, the typed-nil-schema path also panicked — just with a useless runtime error: invalid memory address or nil pointer dereference. So a panics()-style test would have passed before the fix, and it would have passed after. The bug isn't that it panics — the bug is that the panic carries no information.

The test that actually pins the contract checks the panic message:

msg := mustPanicMessage(t, func() {
    s.AddTool(&Tool{
        Name:        "broken-input",
        InputSchema: (*jsonschema.Schema)(nil),
    }, handler)
})
if !strings.Contains(msg, "broken-input") {
    t.Errorf("panic must name the tool; got: %s", msg)
}
if !strings.Contains(msg, "input schema is a nil") {
    t.Errorf("panic must explain the failure; got: %s", msg)
}

The test fails on main with the generic runtime panic message. It passes after the nil-guards. The contract is "the panic message is useful," not "a panic happens" — and the test enforces exactly that.

This generalizes. Test that error messages contain the specific information the user needs to debug, not just that an error occurred. Test that timeouts apply when no deadline is set, not just that the function returns. Test that a credential ends up in the request body and not in the URL — not just that authentication succeeded. The tests that catch real regressions are the ones written against the contract the fix establishes, not the surface behavior it produces.

#Recheck rounds always find more

After the first audit pass on homelab-status-mcp, I was sure the repo was clean. So I did a second pass, expecting nothing. The second pass found five more issues, including a DNS-rebinding TOCTOU I'd missed, an output-path traversal vector via ~/.ssh/authorized_keys, and a JS-driven sub-resource SSRF. None of those were on my mental radar after the first pass.

Same thing on the other two repos. Each "I'm done" point was followed by a recheck that surfaced 3-5 more findings. The pattern was consistent enough that I now build it into the audit process. There is no first-pass audit; there is a first pass and a recheck.

#The OSS coda

The same week I finished the third repo, I was using modelcontextprotocol/go-sdk and noticed an issue (#916) about a SIGSEGV in mcp.AddTool. The reporter had a good description but no working reproducer.

Reading the SDK source with three audits' worth of pattern-recognition primed: typed-nil-in-any is not equal to untyped nil. The reporter's bug had to be a typed-nil schema slipping past the existence check. Six lines of repro proved it, plus a related bug at line 250 of Server.AddTool that has the same shape.

The fix shipped with four nil-guards plus a test that asserts the panic message names the offending tool. The test fails on main with the generic runtime panic; passes with the guards. Same test discipline as before — pin the contract, not the behavior. I posted the comment + reproducer + fix-ready note to the issue this week. As of writing, awaiting maintainer response.

This is the part of the story I think is most worth landing. The patterns aren't portable to other people's code by themselves. The discipline of testing them — the test must fail without the fix, and the failure mode must be the actual bug, not a proxy for it — is what makes the patterns portable. Three audits later, I can read someone else's code and see the same shapes I'd been finding in my own. That's what the repetition bought me.

If you have a Go HTTP client somewhere in your codebase that you wrote in a hurry: read it again with this list next to you. You'll find at least three things. It's that consistent.

The three repos that produced this list:

homelab-status-mcp — read-only homelab observability with strict guardrails
ollama-mcp — multi-host Ollama management
architect-mcp — opinionated website-design blueprints from a local LLM

All MIT, all public, all v0.6+. The patterns above are baked into each of them, not as documentation but as the code itself.

← all writing