From scripts to infrastructure

May 5, 2026 · Kyle Cronin

By the time the mail rebuild was finished, the way I was deploying FeedFilters had grown into something I wasn’t proud of. It started simply enough — a deploy.sh that pushed a fresh image to the production host, restarted the container, and called it done. Then a bootstrap.sh for getting a brand-new host configured: Docker, a deploy user, the shared Caddy network, the firewall. Then a provision.sh for the things bootstrap.sh shouldn’t do at the same time. Then sysctl tuning landed in /etc/sysctl.d for the load-test work. Then every time the app gained a new environment variable, the production .env had to be hand-edited to match the new shape.

Each piece was small and made sense at the time. Together, they’d become fragile. If bootstrap.sh wasn’t carefully idempotent, re-running it on a partially-provisioned host could leave production in a worse state than it found it. Hand-editing .env on a live host is exactly the kind of thing that goes wrong under deadline pressure. And the build was still happening on the production host itself, which I’d never been comfortable with — a mistake during build now meant a sick production host.

What I wanted

What I wanted was for the box’s actual state to match a description of it that lived somewhere other than the box. Predictable provisioning. Image builds happening somewhere besides the production server. A way to add or change configuration without ssh-ing in. Most of all, I wanted to stop worrying that I’d forget a step on the next deploy and find out about it when something broke.

Options considered

The two obvious shapes for this in 2026 are still Kubernetes and a NixOS-style declarative-OS approach. I looked at both.

Kubernetes is the obvious overkill answer. It’s a capable tool with a real learning curve, and most of what makes it pay for itself only starts to matter at multi-node scale or when there are enough apps and enough operators to justify the complexity. There’s also a resource cost: a usable cluster wants its own host (or hosts) just to run the control plane, which would mean a real step up in hosting costs for a tool I wasn’t yet sure I needed. For one box running a handful of personal apps, the tax wasn’t worth it.

NixOS was the option I considered hardest. The configuration-as-code promise is exactly what I was looking for, and the idea of being able to roll a host forward and back through versions of itself is genuinely appealing. But once I dug into what it would actually take to get going — including installing tooling on my development Mac that wasn’t available through Homebrew — I hit the brakes. That was the signal that I was getting in for more than I’d bargained for, on a project where the goal was just “make the box predictable.”

The shape I settled on

The system I ended up with has three pieces, each handling a different layer of what’s running.

OpenTofu owns the cloud side: the Linode host, the Cloudflare DNS records (A, AAAA, MX, SPF, DKIM, DMARC, the works), the reverse-PTR registration on the host’s IP, and the other knobs the cloud APIs care about. Apply once, and what the cloud thinks is true matches what the code says.

Ansible owns the host side: Debian packages, the deploy user, the shared caddy Docker network, the sysctl values the load-test work taught me to set, the docker-compose file that pulls the app image at the right tag. A run takes the host from whatever state it’s in to whatever the playbook describes, without my having to remember which steps I already ran.

GitHub Actions handles the build side. A merge to the production branch builds the image, tags it, pushes it to GHCR, and SSHs into the host to run a small “pull the new image and restart the container” step. The build never touches the production host. The deploy is a single idempotent step at the end of a CI run.

The choice between these tiers wasn’t obvious up front, and I’d be lying if I said I’d planned to settle on this exact combination. But each tool turned out to fit its layer well enough that I haven’t been tempted to swap any of them out.

The mail dovetail

The mail rebuild made all of this a much easier trade to justify. Sending mail reliably required DNS records I hadn’t been managing in code — the MX, the SPF, the DKIM TXT record at the right selector, the DMARC policy — and a reverse-PTR registration on the host’s IP that matches the sender hostname. Without all of those, deliverability suffers in invisible ways. Doing them by hand across two web UIs (Linode for rDNS, Cloudflare for DNS) every time the mail config changed wasn’t realistic for long.

With OpenTofu owning all of it, changes to the mail config and the records they depend on go through the same change. The DKIM key rotation that I’d been quietly avoiding became a small edit instead of a half-day project.

The tiered approach

The three layers turn out to map neatly onto the “one box, many apps” deployment shape that the previous post talked about. Cloud-level infrastructure (the box itself, the records that point at it) is owned by OpenTofu. Host-level configuration (Docker, Caddy, the shared network, sysctls, the deploy user) is owned by Ansible. App-level deployment (the binary image, its runtime config, its own DNS records) is owned by GitHub Actions plus the per-app docker-compose file.

When the next app arrives, only the third layer has to change. The cloud is already there. The host is already provisioned. The new app declares its compose stanza, its Caddy labels, its image, and ships through the same pipeline FeedFilters uses. That’s most of what I wanted from this work: a place where adding the next app is a small, well-bounded job rather than a re-derivation of the whole stack.

Looking back

The fragility arc that motivated this is gone. I haven’t had to ssh into production to fix something in days. The deploy is predictable. New env variables go through code instead of through the production host’s filesystem. And the cloud, the host, and the app each have a single source of truth that lives somewhere other than the box.

The other thing I’m pleased with is that the system is durable in the boring way. If the host disappeared tomorrow, OpenTofu could rebuild the cloud half from its state file, Ansible could provision a fresh host from the playbook, and the deploy pipeline would put the app back where it was. I haven’t tested that end-to-end, but every piece of it has been individually verified during the build, and the gap between “I haven’t tested it” and “I’m confident it works” is much smaller now than it was a week ago.