PhishGuard — building an in-house anti-phishing classifier

2026-04-22 · ~~12 min read · security · go

Last fall a coworker forwarded me a phishing email — the third one that week — with the subject line "is this real?" and I realized our $X/seat email security gateway had been quietly waving these through. The gateway's reporting showed plenty of phishes were still getting through, and the occasional click-through had real consequences. Enough to be a real problem, not enough for the vendor to care.

So I built our own. It's called PhishGuard. It runs as a Go service on the IT-ops box, ingests everything in the journaling mailbox, and classifies. A few months in, it's processed thousands of emails at very high accuracy. This is how it works and what I learned.

#The problem

The commercial gateway gives you three knobs: aggressive, balanced, and permissive. Aggressive blocks legitimate vendor mail. Permissive lets phishes through. Balanced is where every IT team ends up, and it's never quite right.

What I wanted was something that learned from our mail — our vendors, our patterns, the names our employees actually email — and flagged outliers. Not a replacement for the gateway. A second opinion.

#The architecture

journaling mailbox  ──►  PhishGuard ingest (IMAP)
                          │
                          ├── strip headers / extract body
                          ├── embed (local model, ollama)
                          ├── classify (vector similarity vs. labeled set)
                          └── score 0.0 — 1.0
                                │
                          ┌─────┴─────┐
                          │           │
                      score > 0.85   else
                          │           │
                      quarantine   pass
                          │
                      notify user (templated)

Every component is replaceable. The classifier started as a tiny scikit-learn model exported to ONNX and called from Go. It now uses embeddings from a local Ollama model and cosine similarity against a labeled set of historical messages — roughly half phish, half legitimate.

#Why local

I won't ship our employees' email bodies to a third party for classification. That's the entire point. Local inference means a Ryzen box with a consumer GPU does the work, and the data never leaves the building.

#The labels

The single biggest leverage point was the labeling pass. I sat down with three months of journaled mail, sorted by hand into phish / legit / unsure, and only used the first two for training. The "unsure" pile was the gold: marketing emails that look phishy, internal alerts that mimic vendor templates, the weird genre of legitimate-but-aggressive cold sales. Those became a third class — "review" — that surfaces in a daily digest instead of quarantine.

#The accuracy in practice

Across thousands of messages the misclassifications have been rare and instructive. They fall into two categories:

False positives — usually a vendor that changed its template, or a marketing list that looked unusually phishy. These get caught in the daily digest and released within an hour, no big deal.
False negatives — actual phishes that scored below the threshold. The few that have slipped through were near-perfect spoofs of personal accounts (home banks, document-sharing notifications). Reported by users and added to the labeled set the next morning.

That's the part I'm most proud of: the system gets better when it gets it wrong. Every reported miss becomes training data the next morning.

#What I'd do differently

I'd start with the labeling pass instead of the model. The labels are 80% of the value; the model architecture is 20%.
I'd build the daily digest first. The digest is what makes users trust the system — they can see what was caught and why.
I'd quarantine more conservatively. The instinct is to push the threshold low and catch everything. The reality is that one quarantined-by-mistake invoice from a real vendor erodes more trust than ten missed phishes.

#The takeaway

I'm not a security researcher. I'm an operations engineer who got annoyed at a vendor and built a thing on a Friday afternoon that compounded into a Friday afternoon a week for three months. The result is genuinely better security for our team and a software artifact I understand end to end.

The lesson isn't "build your own classifier." The lesson is: when a vendor's product disappoints you, ask whether the gap is actually that hard to close. Often it isn't. The hard part is deciding to start.

Got questions about the build, the labeling pipeline, or the embedding setup? Email me. I'm working on open-sourcing the non-company-specific bits — if that's interesting, say so and it'll move up the queue.

← all writing