Building an MCP server to automate employee offboarding

2026-02-20 · ~~11 min read · mcp · go · ops

Nobody asked me to build this. I'd been offboarding employees manually for years — bouncing between our directory, our mail platform, our ERP, our forms platform, our helpdesk, and a shared file system every time someone left — and watching myself repeat the same multi-step playbook was the kind of friction that eventually pushes you to write code. So I did.

It's an MCP server in Go. I hand it a name and a ticket number, and ten minutes later I get a PASS/FAIL verification report across every system.

This is how it works, and what the first real production run cost me.

#A checklist isn't enough

The playbook used to live in a markdown file. Every system, every query, every gotcha I'd learned from doing this dozens of times. It was good documentation. It also did absolutely nothing on its own.

Worse, the order matters. Convert the mailbox to shared before moving the directory account to the Terminated OU, or the directory's cloud sync breaks. Check for open orders in the ERP before soft-deleting the user, or you orphan sales data. Clear the taker password before anyone else claims it.

I'd already been using Claude Code with MCP servers for the individual systems — directory, mail, SQL, helpdesk, forms. I could ask Claude to disable a directory account or check a user's open orders. But offboarding required coordinating across all of them in a specific sequence, and I was still the orchestrator, holding the playbook in my head. I wanted to hand Claude the employee's name and let it run the whole thing.

#Why a server, not a script

The fastest version of this would have been a Bash script. I could have written one in an afternoon. But a script disables Claude — Claude can't inspect a script's internals, choose which steps to skip, retry just the one tool that failed, or explain to me what it's doing along the way.

A dedicated MCP server gives Claude the same shape of access I have to the underlying systems, but with the workflow encoded into the tool boundaries. disable_directory_account doesn't just flip the disabled bit — it disables the account, sets a timestamped description, and removes the user from every group, in one call. check_erp_dependencies doesn't run a single query — it runs seven, then returns a clear can_delete: true/false verdict.

The point is that the tools enforce the workflow. Claude doesn't have to remember to remove groups when disabling. The tool always does it. That's a constraint a script can't give you.

#The shape of the thing

A single Go binary that runs on my Mac. It connects to five backend systems on startup:

                          ┌──────────────────┐
                          │   Claude Code    │
                          │   (stdio MCP)    │
                          └────────┬─────────┘
                                   │
                          ┌────────┴─────────┐
                          │  offboarding-mcp │
                          │   (Go binary)    │
                          └────────┬─────────┘
                                   │
            ┌──────────┬──────────┬┴──────────┬──────────┐
            │          │          │           │          │
       ┌────┴───┐ ┌───┴────┐ ┌──┴───┐ ┌────┴───┐ ┌───┴────┐
       │  LDAP  │ │  Mail  │ │ SQL  │ │ Helpdsk│ │ Forms  │
       │  (dir) │ │ (cloud)│ │ (ERP)│ │  REST  │ │  REST  │
       └────────┘ └────────┘ └──────┘ └────────┘ └────────┘

No intermediary MCP servers. No shelling out to PowerShell for directory work — go-ldap does it directly. The only PowerShell dependency is the cloud mail platform, because the vendor still hasn't shipped a REST API for mailbox conversion or FullAccess permission grants in 2026.

Twelve tools. The interesting four are below.

#The tools that matter

disable_directory_account is the tool I designed first, and it set the pattern for everything else. It does three things in one call: flips ACCOUNTDISABLE, sets a description in the format Disabled YYYY-MM-DD - Ticket #N - Name, Dept, Location, and removes every group membership. The reason it's one call instead of three is that the steps always happen together — there's no offboarding scenario where I'd disable the account but leave the groups attached. Forcing the workflow into the tool surface is the whole job.

check_erp_dependencies is the gatekeeper. It runs seven queries against the ERP — open orders as taker, open quotes, open RMAs, open POs, default-buyer items, default-salesrep customers, open transfers — and returns a structured verdict. If a sales associate has dozens of open orders, the tool refuses to allow deletion and explains why. The taker field is informational on those orders but referenced — orphan it and you break sales reporting six months later. This tool exists because I orphaned that field on my second-ever ERP cleanup, four years ago, and never wanted to do it again.

convert_mailbox is the only tool that calls PowerShell. To convert a mailbox to shared, the cloud mail platform requires a connect-and-authenticate dance. To get non-interactive auth on macOS, you fetch an access token from the same OAuth app, then pass it to the connect cmdlet along with the tenant domain. There's a footgun here that cost me an hour the first time: in recent versions, the access token has to be a SecureString, not a plain string, and the organization parameter wants the tenant domain, not the tenant GUID. I wrote it wrong on a Saturday and spent the next hour decoding tokens in jwt.io before I figured out the API was politely refusing my plain-string auth.

verify_offboarding is the tool I didn't expect to use most. It runs every check across every system and returns a PASS/FAIL/WARN/SKIP report. Directory account disabled? Groups removed? In Terminated OU? Cloud sign-in blocked? Licenses revoked? ERP soft-deleted? Taker password cleared? Forms platform account flagged?

I built it as the closer of the workflow — call this last to confirm everything happened. What I do is run it first, on offboardings I've done by hand, just to make sure I didn't miss a step a year ago. It turns the playbook from a document into a continuous audit. If a tool is the most useful thing in the whole project, it's this one, and I almost didn't write it.

#The first real run, and the 403 that wasn't a 403

The first production offboarding using the server failed at step five. remove_licenses returned 200 for sign-in blocking, then 403 on the first group removal. I assumed I had a permissions problem.

Three hours of decoding access tokens at jwt.io later, the problem turned out to be subtler. I had two OAuth app registrations — one for our internal dashboard, one for automation. The shared directory_config.json pointed to the dashboard app, which didn't have the mail or license-management permissions. The offboarding server silently fell through to the shared config and used the wrong app. Sign-in blocking worked because both apps had User.ReadWrite.All. License revocation didn't, because only the automation app had Organization.ReadWrite.All.

Easy fix once I saw it. The offboarding server's config.yaml now has its own block that explicitly overrides the shared config.

But the deeper bug was elsewhere, and I only found it because of the 403. My license removal tool was iterating every result from the directory API endpoint users/{id}/memberOf and calling groups/{id}/members/$ref on each one. The endpoint returns groups, directory roles, and administrative units. The 403 wasn't a permissions error — it was the API politely declining to remove a user from a directory role using the groups API. The fix was a filter:

for _, m := range members {
    if m.OdataType != "#microsoft.graph.group" {
        continue
    }
    // ... remove from group
}

I never would have found that bug if the first 403 hadn't sent me spelunking. The right error message in the wrong place is sometimes the only debugger you have.

#What a Windows fleet teaches you about SSH keys

backup_user_pc shells out to scp with sshpass. A domain admin credential, in a chmod 600 config file, used to copy the user's Desktop and Documents to shared storage before their workstation gets wiped.

I know how this reads. SSH keys are a solved problem. Password auth is considered harmful. You should use cert-based auth, or at least public keys deployed via Group Policy. I am the IT guy who tells everyone else to do this.

The reality is that we have a Windows endpoint fleet that rotates constantly. Sales reps swap laptops several times a year. New hires get whatever was sitting on the shelf. Deploying SSH keys to every machine that briefly enters the fleet is a maintenance burden that never finishes — and in our threat model, the marginal security gain over a chmod 600 credential file held by the IT admin who already has domain-admin rights is approximately zero.

This is the kind of operational call that drives security purists crazy and that anyone who has actually managed a churning Windows fleet immediately understands. The right answer in a homelab is not the same as the right answer at a production fleet. Pick the constraint that's actually binding and stop solving for the others.

#What's next

The MCP server handles the IT side. Hardware recovery, manager coordination, forms-platform portal cleanup — those still happen by hand. Some of them could be automated. Our MDM has a device-management API I haven't dug into. The forms-platform admin portal has a reluctant REST API I keep not bothering to explore.

The bigger pattern is reusable. Onboarding is the obvious mirror: create the directory account, assign licenses, set up the mailbox, create the ERP user, provision the workstation. Same systems, opposite direction, different ordering rules.

For now, I hand Claude a name and a ticket number, and ten minutes later I get the report. The checklist still exists in markdown somewhere, as the canonical reference. The MCP server is what actually runs.

← all writing