← all writing

I got tired of scrubbing the robot out of my own writing

Every draft one of these models hands me comes with the same fingerprints on it. The em-dashes. "Leverage." "Robust." The eager, over-polished tone of a kid who just got into MIT and wants you to know it. None of that is how I talk. I grew up around people born in the early '50s who didn't finish high school, and the voice that rings true to me is plainer and dryer than anything a chatbot reaches for by default.

So for a while my writing process had a dumb last step: I'd take the draft and go hunting for the tells by hand. Delete the dashes. Swap "utilize" back to "use." Cut the closing line that lands like a mic drop. It works, but it's the kind of chore a computer should be doing, and I'm a guy who automates chores for a living.

A while back I tried to fix this the fancy way and it blew up in my face, which turned out to be the useful part.

#The fancy way didn't work

The obvious move, if you've drunk enough of the Kool-Aid, is to train a model on yourself. Take a small open model, feed it a few thousand words of your own writing, and nudge it toward your style. I did exactly that. Fifty seconds of training on my laptop, and I asked it to describe one of my projects.

It came back sounding faintly like me and knowing absolutely nothing. It told me one of my tools was "a little buddy who knows what I'm talking about." That tool is a developer utility that sits next to my code. It is not a buddy. The model had the cadence about half right and every single fact wrong. A vaguely-me-shaped liar.

The lesson there was worth more than a win would have been: teaching a model your style is a different job than teaching it facts, and style is the easy half. You don't need to retrain anything. You just need to hand a model that already writes well a good description of how you sound, and get out of the way. A prompt beats a home-grown fine-tune for this, and it isn't close.

Which is great, except a description of how I sound was just sitting in a file on my laptop. To use it I had to remember it existed, paste it in, and eyeball the result. That's not a tool. That's a sticky note.

#Make it something any session can call

What I actually wanted: any Claude session I've got open, on any project, able to check its own writing against my voice without me babysitting it. And able to fix it when it's off.

That's what MCP is for. It's the little plug that lets a model reach a tool you wrote. So I built one. Two jobs:

The whole thing runs as a service on my little home Kubernetes cluster, reachable from anywhere on my network. The model doing the actual rewriting is a local one (gemma running on Thor, the box with the beefy GPU), so none of my half-formed drafts leave the house.

#The part I almost got wrong

Here's where it gets interesting, and where I nearly built the worse version.

My first instinct was to make score_voice smart. Have it ask a language model "hey, does this sound like Jason?" and let the model judge. That's the impressive-sounding design. It's also the wrong one, and a guy I trust talked me out of it.

The problem is speed and trust. Asking a local model to render a subjective verdict takes ten, twenty, thirty seconds, and it's wishy-washy. Run the same paragraph past it twice and you can get two different answers. Now bolt that onto a loop that's supposed to rewrite-and-recheck a few times, and the whole thing turns into molasses. Running like a herd of turtles, and you don't even fully believe the turtles.

So the scoring doesn't ask a model anything. It's plain code checking plain rules. Count the em-dashes. Grep for the buzzword list. Flag the sentences over a length cap. Check that "I" actually shows up. All of it runs in well under a millisecond and gives the same answer every single time. No GPU, no waiting, no coin flip.

That's the call I keep coming back to on these projects: the cheap, dumb, deterministic check carries most of the weight, and the expensive smart one isn't worth what it costs in the loop. Roughly four-fifths of "does this sound like a robot wrote it" is mechanical. Em-dashes, buzzwords, length. You don't need a billion-parameter model to catch a dash. You need a for loop.

I didn't throw the smart judge away entirely. It runs, but off to the side, as a quality check I kick off when I change the rules or swap the model out. It answers a question the fast checks can't: not "did this break a rule" but "does this actually sound like how I really rewrote it last time." That's a fair question for a slow, fuzzy judge. It just has no business sitting in the fast path where I'm iterating.

#It's held together with tape, and that's fine

I want to be honest about what this is. It's a homelab tool, not a product. There's no login on it, because it never leaves my own network. The server recompiles itself from scratch every time the pod restarts, which takes half a minute, because I haven't bothered to package it properly yet. The "does it sound like the real me" judge is still half-built. If I were shipping this to other people, every one of those would be a problem. I'm not, so they're a someday.

But the core of it works, and I use it. The fun part is the recursion: I handed this very post to score_voice to see where I slipped. It came back 0.8, and the thing it dinged me for is perfect. It flagged "leverage" and "robust" as buzzwords. Which they are. They're sitting up at the top of this post because I was quoting them as the enemy. The checker has no idea I'm making fun of those words. It just sees them in the text and throws the flag.

That's the whole point in one bug. A dumb, fast, literal check is going to be dumb, fast, and literal. It can't tell quotation from confession. But it caught every real tell, it ran instantly, and it gave me the same number it'd give anyone. I'll take a checker that's occasionally too literal over a smart one that takes thirty seconds to maybe agree with itself.

The bigger thing I keep relearning, on this and on a few other tools I've built lately: the move that makes these AI projects actually useful is usually figuring out what not to hand the model. The flashy design asks the model to do everything. The one that works asks it to do the one part that genuinely needs judgment, and hands the rest to a few lines of code that are fast, free, and never lie to you.

The model writes the prose. A for loop keeps it honest.


← all writing