The Fortress That Codes: Air-Gapped, High-Integrity Dev Environments With the Best DX You're Allowed to Have // Hunter Wigelsworth

Let’s get something out of the way upfront. If you came here looking for “secure dev environment, but maybe not too secure,” this article isn’t for you.

What we’re going to build is an architecture with these properties:

Source code never touches a developer laptop disk. Not unencrypted. Not on a “trusted” laptop. Never.
No internet egress from any dev environment. Not “egress via approved proxy.” Not “egress to an allow-list.” None.
Every workload’s identity is bound to a hardware root of trust that proves the workload is running on the right hardware, with the right image, in the right configuration.
Every build is hermetic, every artifact is signed, every dependency is content-addressed and pinned.
The browser lives outside the dev perimeter entirely. No exceptions.
All secrets are ephemeral, per-workload, never persisted to disk in plaintext, never seen by a human.

That’s a lot. But here’s the thing nobody selling you “secure dev environment” tooling will tell you: you can build this in 2026 and the developer experience can be genuinely excellent. Not tolerable. Not “better than jail.” Excellent. The reason is that the OSS ecosystem has finally matured to the point where every primitive you need is production-ready and composable, and the IDE experience over a fast network has gotten good enough that the developer basically can’t tell their code isn’t on their laptop.

The reason most organizations don’t ship this isn’t technical. It’s that nobody decided it was a priority. This article is my attempt to give you the architecture, the components, the tradeoffs, and the bill of materials so you can make that decision yourself.

We’re going to go deep. Skip ahead if you don’t care about the why.

The threat model, sharpened

Let’s be honest about what we’re actually defending against, because hand-wavy threat models are how you end up with $400k a year of security theater that doesn’t stop the actual attacks.

What we’re defending:

Source code exfiltration via the developer’s endpoint. A laptop with a customer database, crypto keys, ML model weights, or proprietary auth code on disk is a liability waiting to happen. Lost laptops. Malware. Coercion. Accidental sync to a personal cloud account. Pick your vector — they’re all real and they all happen constantly.

Source code exfiltration via the AI coding assistant. In 2026 this is the single biggest leak vector and most organizations haven’t even noticed yet. Every “accept suggestion” in Cursor or Copilot sends your code — including surrounding context, often hundreds of lines — to a third-party API. The prompts frequently contain file paths, comments that name the project, and code patterns that uniquely identify your company. If you don’t have a story for where your AI assistant traffic is going, your source code is leaving the building every time a developer hits tab.

Source code exfiltration via the build pipeline. Compromised CI runner. Hijacked maintainer account on a dependency you trust. Malicious build action that injects a backdoor into your final binary. SolarWinds. xz-utils. The pattern is well-understood by now and the answer is well-understood too: hermetic builds, signed provenance, content-addressed dependencies.

Insider threat with persistence. Engineer with legitimate repo access exfiltrates via personal email, USB stick, or a screenshot to Slack. Or worse: a coerced engineer with a deadline to deliver source to a competitor or nation-state. These are harder to defend against purely with architecture — you need people controls — but a well-built dev environment makes the technical exfiltration much harder.

Compromise of the cloud provider. Even if you trust your hyperscaler — and you should not blindly — you should architect so you don’t have to.

What we explicitly accept:

The physical security of the datacenter running the dev environments. (You’re operating your own, or you’re contracting with a provider whose physical security you’ve audited.) The supply chain of your hardware vendors — you’re using AMD, Intel, Arm, HPE, Dell, Supermicro as purchased, and you’re not fabbing your own chips. The possibility of a malicious insider who is also a talented enough engineer to compromise the attestation stack itself. And silicon-level hardware implants, which are real but are what export controls and tamper-evident packaging are for, not what a dev environment architecture defends against.

If your threat model includes the things we’re explicitly accepting, you’re in different territory. Call Anduril or a three-letter agency. They have dedicated budget for that.

For the rest of us, here’s the architecture.

The non-negotiable primitives

These are not up for debate. If any of these fail, the whole thing fails.

Hardware root of trust on every endpoint

Every developer laptop or workstation needs:

A TPM 2.0 (discrete is better, firmware TPM is acceptable if you trust the platform).
Measured boot enabled in UEFI.
Linux IMA or equivalent runtime integrity measurement, so every binary the kernel loads gets a hash logged into TPM PCRs.
Full-disk encryption — LUKS on Linux, FileVault on macOS, BitLocker on Windows — with the key sealed to the TPM.
The Keylime agent running, reporting PCR values to your attestation verifier.

The verifier is checking, continuously, that the TPM quotes match the expected PCR values for a known-good image and that IMA hasn’t seen any binaries outside the policy. (Be precise about what attestation proves: the machine’s measured state, not that a key was never exfiltrated — sealing the disk key to the TPM is what keeps it from ever existing outside the hardware in the first place.) If any of those change unexpectedly, the laptop loses its ability to authenticate to anything in your infrastructure. We revoke the SPIRE identity, the SSH cert doesn’t issue, the gateway closes. The laptop becomes a paperweight.

The cost of this is roughly zero in software — Keylime is OSS, the Linux tooling is OSS — and a one-time engineering investment in building the verifier with the right policies. The cultural cost is higher: developers have to accept that their laptop is being continuously monitored. Get them to agree this is the price of working on the cool stuff, not the price of being mistrusted. That framing matters.

Hardware root of trust on every server

Same idea, different scale. Every server that runs dev environments needs:

AMD EPYC (9004 Genoa or 9005 Turin) with SEV-SNP enabled, or Intel Xeon 6 with TDX enabled. (Arm CCA is coming but not buyable: the first CCA silicon is Azure’s internal Cobalt 200, and no merchant-market Arm server CPU ships with CCA yet. NVIDIA’s upcoming Vera advertises native confidential computing but hasn’t publicly committed to Arm CCA — its story so far is TDISP/IDE and encrypted NVLink. Plan on AMD or Intel today.)
TPM 2.0 for the host itself.
Secure boot enabled with your own platform keys enrolled.
BMC with measured boot and signed firmware, which is mostly an HPE / Dell / Supermicro feature and varies by vendor.

The reason this matters: in a confidential VM world, the attestation is the security. If the underlying hardware is lying about its identity, the attestation is meaningless. So your host hardware has to be trustworthy, and that means hardware roots of trust all the way down to the silicon.

Confidential VMs (or worse, on bare metal)

Every dev environment runs inside a confidential VM with memory encryption and integrity protection. SEV-SNP, TDX, or CCA. Pick one, don’t mix unless you have a real reason.

Why VM-grade and not just container-grade? Because ordinary containers share the host kernel — one kernel exploit exposes every workload on the node — and even gVisor’s userspace kernel still makes a narrow set of real host syscalls. Kata fixes that part by giving each pod its own VM with its own guest kernel, which is exactly why the stack below builds on it. But isolation from neighbors isn’t isolation from the host: without memory encryption and launch measurement, the hypervisor can still read guest memory at will. With a confidential VM, the kernel is yours — it boots from an image you control, measured at launch, and the host cannot reach into it.

If you genuinely need bare-metal (some performance-critical ML workloads, or you have hardware that can’t do confidential VMs), then the entire host becomes the workload’s trust boundary. Bring everything else — TPM attestation, measured boot, full-disk encryption, IMA — up to bare-metal equivalents. It’s doable but it’s harder, and most of the time you don’t need to.

No network egress from any dev workload

This is the rule that breaks most “secure dev environment” projects. The instinct is to allow egress to a curated list: package mirrors, version control, the company’s own services. Don’t. The problem with allow-lists is that they’re long, and every entry is a potential exfiltration vector. If your dev environment can reach registry.npmjs.org, it can reach evil-look-alike.example.com. If it can reach Slack, it can reach anywhere.

The right architecture is two separate networks, with no bridge between them except the explicit one:

The inside network is your dev environments, your internal services, your secret broker, your build cluster, your package mirrors. It has no default route to the internet. It’s hard to add one accidentally because there is literally no default route. Everything here talks to everything else via mTLS with workload identity.

The outside network is the developer’s laptop, the browser, AI assistant services that aren’t self-hosted, Slack, email, general internet. It cannot reach the inside network except through specific authenticated, attested gateways, and even then it can’t read source code — it can only forward encrypted IDE traffic.

The developer’s IDE — VS Code, JetBrains, Neovim, whatever — runs on their laptop and connects to the dev environment over SSH. That’s the bridge. It’s narrow, it’s monitored, and it’s the only one.

All service-to-service traffic is mTLS with workload identity

Every service-to-service call inside the perimeter is mTLS. No API keys. No shared secrets. No “trusted internal network” nonsense. SPIRE (or any SPIFFE-compatible implementation) issues X.509 SVIDs to every workload. The workload proves its identity by presenting the SVID, and the other end verifies via the SPIRE federation trust bundle.

This means your source repo isn’t https://github.internal/... with a long-lived PAT — it’s a workload-identity-aware clone that uses a SPIRE-issued SVID. Your package mirror authenticates callers via SPIFFE, not via IP allow-lists. Your CI runner talks to your build cache via mTLS, not via a shared password.

The cost is that you have to operate SPIRE. The benefit is that you don’t have to rotate secrets, you don’t have to manage tokens, and you don’t have a class of attacks where someone steals a long-lived credential.

All build artifacts are signed with provenance

Every container image, every binary, every tarball produced by CI is signed with Sigstore — Cosign for the signature, Fulcio for the short-lived OIDC signing certificate, Rekor for the immutable transparency log. The signature includes a SLSA provenance attestation that says: this artifact was built by this CI runner, from this source commit, using these specific dependencies.

If you can’t trace an artifact back to its source and prove the source is what you think it is, you can’t ship it.

The nice thing about Sigstore is that it’s actually pleasant to use. The tooling is good. Rekor gives you an immutable audit log. Fulcio issues short-lived signing certificates via OIDC, so you’re not managing long-lived signing keys. Be honest about the operational footprint, though: a fully private keyless deployment means running Fulcio, an OIDC identity provider for it to trust, a certificate transparency log, your own TUF trust root, and Rekor — which in its v1 form is Trillian plus a database, not a single container (Rekor v2’s tile-backed design slims this down considerably). The sigstore-scaffolding charts cover all of it, but budget for a real service, not a sidecar.

All secrets are ephemeral and per-workload

No long-lived database credentials in .env files. No SSH keys committed to repos. No API tokens shared in Slack DMs.

Every secret is:

Issued at workload startup, bound to the workload’s SPIRE identity.
Time-limited. Typically one hour TTL.
Automatically rotated.
Never written to disk in plaintext by the workload.

For things like database credentials, you use OpenBao dynamic secrets (or Infisical, or your cloud provider’s secret manager if you’re not air-gapped) with a short TTL and per-workload identity binding. For things like SSH keys into your source repo, you use SPIFFE-issued SVIDs and the source repo’s workload-identity auth feature. For things like “I need to push code from my workstation to the repo,” you use SSH certificates signed by your internal CA, with the certificate bound to your SPIRE-issued workload identity and short-lived.

OK. That’s the security side locked in. Now the part everyone gets wrong.

But what about developer experience?

Here’s the truth: the security primitives above are not the hard part. The hard part is making the developer experience feel like local development, even though the actual code is running on a remote host inside a confidential VM.

Most “secure dev environment” projects fail not because the security was wrong, but because the developer experience was so bad that the team routed around it. They installed personal Dropbox and synced their home directory. They SSH’d from their personal laptop into the dev environment and copy-pasted into ChatGPT. They wrote a personal cron job that exfiltrated everything at 5 PM because nobody had stopped them yet.

The way you avoid this is by making the secure path the path of least resistance. The developer shouldn’t even notice the security is there. They should open their IDE, see their codebase, write code, hit run, and have it work.

This is achievable. It’s not free, but it’s achievable.

The IDE, and why latency is the silent killer

VS Code Remote-SSH is the default answer for most teams, and for good reason. It’s well-tuned, it has the best extension ecosystem, and the LSP / IntelliSense experience over a fast connection is basically indistinguishable from local.

The thing that determines whether VS Code Remote feels good or feels awful is network round-trip time. This is the silent killer of remote dev environments and it’s the thing most teams don’t measure until they’re already in pain.

A rough rubric, based on my own production deployments:

Under 30ms RTT: Feels local. Most developers won’t even realize their code isn’t on their laptop.
30 to 80ms RTT: Tolerable. Some extensions get chatty and you start noticing. Auto-complete lags slightly.
80 to 150ms RTT: Noticeable. Code completion visibly lags. File watching gets weird — VS Code’s file watcher occasionally misses events and re-scans everything, which kills performance.
Over 150ms RTT: Painful. You’ll lose developers. They will start working on their personal laptops.

The way you keep RTT low is to put the dev environment servers geographically close to your developers. If you’re a single-datacenter shop, this means picking a region near your main engineering office. If you’re distributed, you need multiple dev environment clusters with geographic routing — a developer in Singapore should not be working against a cluster in Virginia.

For JetBrains users, the JetBrains Gateway backend works similarly but the indexing is heavier over high-latency links, so this is the worse choice for distributed teams.

For the purists — Neovim, Emacs, helix — you can get a more responsive development experience than VS Code over the same network, because there’s no Electron overhead. You give up the polished extension ecosystem but you get a tool that’s snappier by default.

My recommendation: VS Code Remote-SSH as the default for most engineers. Neovim available for power users who care about latency. JetBrains Gateway for the inevitable “but I need IntelliJ for Java” cases. Don’t fight this, just make sure all three work without drama.

The remote filesystem

The IDE has to read and write files on the dev environment. There are three ways to do this and they’re wildly different in performance.

SSHFS mounts a remote filesystem over FUSE. The IDE thinks files are local, but every read and write is a network round-trip. It’s awful for hot paths like build artifact directories or node_modules. Don’t use SSHFS for dev environments. Use it for the rare case where you need to mount a remote directory on your local machine.

VS Code’s own remote FS protocol is what VS Code Remote-SSH uses. It batches operations, has smart caching for file metadata, and only transfers file contents when the IDE actually asks for them. Much better than SSHFS. This is the right answer for VS Code users and it’s the reason VS Code Remote feels so much better than SSHFS-based alternatives.

Virtio-fs is the host-to-guest file-sharing protocol for VMs, and it’s fast — near-native. It’s also exactly what you must not use here. Virtio-fs works by having the guest read files that live on the host’s filesystem, in plaintext, which hands your entire workspace to the very host your threat model says you don’t trust. This is why Confidential Containers restricts host-shared filesystems: a confidential VM that mounts its home directory over virtio-fs is confidential in name only.

For dev environment VMs, the right answer is a block volume attached to the guest and encrypted inside the guest — dm-crypt/LUKS running in the VM, with the volume key released by the attestation service only after the VM proves what it’s running. The host sees ciphertext; the guest sees a fast local disk, and on NVMe the encryption overhead is small enough that builds still feel local. (This architecture assumes Linux in the guest. Confidential Windows guests do exist — Azure’s SEV-SNP and TDX confidential VMs have supported Windows Server and Windows 11 since 2022, including the confidential AVD pattern — but everything else in this post is Linux-first.)

Workspace lifecycle: throwaway VMs, persistent state

The “ephemeral workspace” idea gets overhyped. Yes, you want to be able to throw away and recreate a workspace when something breaks. No, you don’t want every developer to lose all their uncommitted state every night.

The architecture that works:

Workspace state lives in a per-developer persistent volume. This is where their home directory, git checkouts, build caches, language server indexes, etc. live. It’s encrypted at rest. It can only be attached to attested confidential VMs.
The VM itself is ephemeral. It boots from an OCI image, mounts the persistent volume, runs the IDE server. When the developer is done for the day, the VM shuts down. The persistent volume stays.
When the developer reconnects, a new VM boots, attaches to the same persistent volume, and they pick up exactly where they left off. Uncommitted changes are still there. Their shell history is still there. Their browser tabs in the IDE are still there.
Snapshots of the persistent volume are taken periodically (every six hours is a good default) and stored encrypted. Rollback to a previous snapshot is possible in seconds.

This gives you:

Confidentiality: When the VM isn’t running, the volume is detached and encrypted. The volume can only be mounted by an attested VM. If the developer loses their laptop, the volume is unreadable.
Reproducibility: When a VM gets corrupted or compromised, you throw it away and start a fresh one. No “rebuild the developer’s machine” ritual.
Speed: VM boot is seconds. Snapshot restore is seconds. There’s no “rebuild from scratch” tax unless something is genuinely very wrong.

Coder and Ona (formerly Gitpod) both implement versions of this pattern. I have a soft preference for Coder because the workspaces-as-Terraform model makes it easier to standardize environments across teams, but either will work. Don’t over-invest in this choice — both are good, the lock-in is minimal, and you can switch later.

The compute stack

Now we’re getting concrete. Here’s what I’d actually deploy.

Hardware. Two pools.

The CPU pool runs on AMD EPYC 9355 servers (32-core, $15k-ish each) or Intel Xeon equivalents with TDX. 256 to 512 GB of DDR5 per server. 1 server handles roughly 20 concurrent dev environments. This is where 80% of your developers live.

The GPU pool runs on H100 or H200 servers for AI workloads. Hopper is the first generation with full confidential computing support on the GPU side; H200 is just bigger H100 with more VRAM. Both work. Blackwell is the newest generation and adds multi-GPU confidential compute via TDISP/IDE, which is interesting for large model training but isn’t strictly necessary for the AI dev assistant use case.

For a dev environment fleet, the rough hardware spec per server:

1× AMD EPYC 9355 (32-core) or Intel Xeon 6527P (24-core)
256 to 512 GB DDR5
1× H100 or H200 per ~5 concurrent dev environments with AI inference needs
4 to 8× NVMe drives for local scratch and image cache (durable workspace volumes live on the storage cluster)
Dual 25 GbE NICs — one for management, one for the dev network
TPM 2.0 on the motherboard

You do need network block storage — workspace volumes have to survive their VM and reattach wherever the replacement boots, which is what the Ceph (or Longhorn) cluster in the bill is for. What you don’t need is an exotic parallel filesystem: each workspace is maybe 100GB of block storage, and NVMe-backed Ceph handles that fine.

Software stack. Kubernetes, with Kata Containers and Confidential Containers layered on top. Each pod runs inside its own confidential VM. The CoCo attestation agent inside the guest talks to Trustee, which is the multi-arch attestation broker. SPIRE issues identities based on the attestation. The Kubernetes API is your interface for managing all of this.

The NVIDIA GPU Operator has supported Kata + Confidential Containers GPU workloads since the technology-preview days (23.x–24.9), and recent releases hardened it — 25.10 reworked Kata deployment onto kata-deploy, and the 26.3-era confidential containers reference architecture made it a first-class supported path. That means you can run confidential containers with GPU access for AI dev workloads. This is the move for ML teams — it’s the difference between “we have GPUs” and “we have GPUs that we can prove didn’t leak your training data to anyone, including us.”

Why Kubernetes + Kata + CoCo. This is the “Cloud Native” stack and the ecosystem is best. The tooling is best. The hireable talent pool is biggest. The alternatives — Nomad + Firecracker + custom attestation glue, OpenStack + libvirt + custom glue — are all viable but you’ll write more glue code and operate more custom infrastructure. For most organizations, the right answer is to ride the CoCo ecosystem.

The bastion. A small VM or bare-metal host that is the developer’s only entry point. It accepts WireGuard connections from developer laptops, gated on the laptop’s Keylime attestation status — WireGuard itself has no attestation hook, so the control plane installs the laptop’s peer config only while Keylime reports the device healthy, and rips it out the moment verification fails — and forwards SSH connections to the appropriate dev environment VM, again verified via SPIRE identity. The bastion has no access to the dev environments’ memory — it only forwards encrypted traffic.

You can run the bastion with OpenZiti instead of plain WireGuard if you want identity-aware networking all the way down. OpenZiti does overlay networking with identity baked in at the protocol level. The DX is a little weirder — every application you want to use over Ziti has to be Ziti-aware or wrapped — but the security model is genuinely better than “WireGuard + bastion.”

The AI assistant question (the hardest one)

This is the section I’m most likely to get yelled at for. Here it is anyway.

The non-negotiable rule: no source code from inside the perimeter can ever be sent to an external AI API. No exceptions. No “but the data is non-sensitive.” No “we have an NDA with OpenAI.” The whole point of the architecture is that source code never leaves a controlled environment, and a third-party API call is leaving.

So the AI assistant must be self-hosted. Period.

The model

In 2026, you have real choices. The closed frontier — GPT-5.x, Claude Opus 4.x — still holds an edge on the hardest agentic work, but the gap has narrowed to the point of being contestable: the current open-weight flagships match or beat frontier closed models on some SWE-Bench-class benchmarks. For daily dev tasks (autocomplete, simple refactors, “explain this function”), the open models are simply good.

The realistic contenders right now:

Qwen3-Coder-Next is my default recommendation. Apache 2.0 licensed, an 80B-total / 3B-active MoE that runs comfortably on a single H100 80GB, and it lands around 70% on SWE-Bench Verified — remarkable for something this cheap to serve. It’s not going to win the hardest agentic benchmarks against the closed frontier, but for the autocomplete-and-everyday-tasks usage that fills 80% of AI assistant traffic, it’s a great fit. (If you’d rather run a dense mid-size model, Qwen3.6-27B is the one to look at.)

DeepSeek V4-Pro is what you reach for when you need real agentic capability. MIT-licensed, a Mixture-of-Experts model with 1.6T total / 49B active parameters and a million-token context, and its agentic coding numbers are frontier-class. The downside is footprint: the weights alone need a multi-GPU H200-class node before you can serve a single request. And size serving by aggregate throughput, not per-user GPU math — a vLLM replica batches many concurrent users.

GLM-5.1 is comparable to DeepSeek V4-Pro for agentic work, with slightly different tradeoffs in license and serving characteristics — and Zhipu iterates fast (GLM-5.2, MIT-licensed with a 1M context, shipped as this post was being written). Either family is a fine choice.

Kimi K2.6 is a strong agentic alternative with a 262K-token context window — not the million tokens sometimes claimed for it. Useful for “explain this entire subsystem” workflows. It’s a trillion-parameter-class MoE and memory-hungry — you’ll want H200s rather than H100s.

Codestral 2 is Mistral’s code-specific offering. Smaller and faster than the others, with strong fill-in-the-middle. Good if you want a dedicated low-latency completion model. Mind the license: Codestral 2 is Apache 2.0, but the earlier 25.x weights shipped under Mistral’s Non-Production License, which does not permit commercial self-hosting. Check the exact checkpoint before you deploy.

The serving infrastructure

For actual inference, you have a few choices:

vLLM is the high-throughput, well-maintained default. Production-grade. Used by most serious self-hosted LLM deployments.

Tabby is purpose-built for code completion. Has its own model server and its own IDE extensions, wired up over LSP (standard completion requests plus Tabby-specific protocol extensions). If you want a turnkey “AI coding assistant that just works,” Tabby is the answer.

Ollama is fine for a single developer running on a workstation. Not appropriate for serving a team.

For a dev environment fleet with 100 to 500 developers, the right architecture is:

A pool of H100 / H200 GPUs, separate from the CPU pool.
vLLM or Tabby running on those GPUs.
An inference gateway that authenticates dev workloads via SPIRE and routes to the appropriate model.
The inference pool itself runs in confidential VMs, with attestation, so the dev workloads can verify that their prompts aren’t being logged by the inference server.

That last point is the part most people skip. The inference server itself should be attested and confidential. Otherwise the developer is sending source code to an internal service that might be logging it, and the audit story becomes “we pinky-promise we’re not logging it.” You want hardware-rooted proof that the inference service is running the model you think it’s running, with the logging configuration you think it has, on hardware you trust.

This is doable. Trustee supports attestation of GPU workloads now. SPIRE can issue workload identities based on the GPU attestation. The integration story is mostly there.

The honest tradeoff

The frontier closed models are still meaningfully better at:

Hard debugging tasks that span many files.
Multi-step refactors.
“Explain this entire codebase” queries.
Anything where you want the AI to actually run code and check its work.

If your developers are working on novel, hard problems where they really need the best AI, the gap will be felt. There is no good answer to this — the security constraints and the model quality are in tension.

Three options:

Accept the gap. Most engineers, once they’ve used Tabby with Qwen3-Coder-Next for a week, stop noticing. The autocomplete is good. The simple refactors are good. For the rare “I really need the frontier model for this” cases, your developers can use a separate, non-sensitive environment where they paste in a redacted snippet and talk to ChatGPT or Claude about that. The point is that the source code never goes anywhere — but redacted discussions about algorithms and approaches are fine.

Fine-tune. Train a fine-tune of an open model on your own codebase. This works surprisingly well. It’s a project of its own — you’ll need an ML team, a training pipeline, evaluation infrastructure — but it’s the answer if “code completion quality” is genuinely your bottleneck.

Make targeted exceptions. Allow external AI tools for specific repos that don’t contain sensitive code (open-source repos, public SDKs, etc.). Track which repos these are, audit regularly.

Don’t pretend there’s no tradeoff. But also don’t pretend the open models are unusable — they’re very good for autocomplete and common coding tasks, and the gap is closing fast.

Builds that don’t lie

Even with perfect source code protection, your build pipeline can turn safe source into a backdoored binary. This is the SolarWinds / xz-utils attack class, and it’s the reason supply chain security gets its own section.

Hermetic builds

Your build system must be hermetic: given the same inputs, it produces the same outputs, regardless of the host environment. This is what makes builds reproducible, auditable, and safe from supply chain attacks.

Two real choices:

Bazel is Google’s build system and the de facto standard for hermetic builds at scale. The learning curve is brutal. The result is excellent — if you have a JVM-heavy or polyglot monorepo, Bazel is the answer. Most large security-conscious organizations use Bazel for exactly this reason.

Nix is more general-purpose than Bazel. Steeper initial learning curve but the package manager (nixpkgs) is enormous and the “works anywhere, same result every time” guarantee is stronger than Bazel’s. If you’re doing ML or systems work and Bazel’s hermeticity model fights you, Nix is the alternative.

The Bazel vs Nix debate is roughly the same energy as the vim vs emacs debate. Pick one based on what your team already knows. Don’t try to migrate an existing build system to either unless you have a real reason.

The dependency story, air-gapped edition

In a normal environment, your build fetches dependencies from the internet. In an air-gapped environment, this is impossible.

The right architecture:

Internal package mirror that proxies every external package source you need: PyPI, npm, crates.io, Maven Central, Conda, the works. Sonatype Nexus Repository Community Edition covers all of these formats. JFrog’s free tiers don’t — Artifactory OSS speaks only the Maven family and CE only adds Conan; npm/PyPI/Cargo proxying is a paid Artifactory feature.
The mirror is updated on a schedule by a process that has internet egress — a separate “update” machine in the perimeter, which downloads new package versions, verifies checksums, signs them, and publishes them to the internal mirror. The update process itself is air-gapped from the dev environments; it has its own network segment and its own attestation chain.
Builds are pinned to specific versions. No floating versions. If you want to upgrade, you bump the pin and let CI verify.
The mirror itself is content-addressed and immutable. You can always reproduce an old build because the old dependencies are still there.

For Bazel specifically: you want to run an internal remote build cache (also signed and content-addressed) and an internal registry for any Bazel modules you depend on. The tooling for this exists; the configuration is tedious.

CI runners that are also confidential VMs

Your CI runner is also a confidential VM (or a bare-metal host with full attestation). It runs in the same perimeter as your dev environments. It:

Pulls source from your internal git server, authenticated via SPIRE.
Pulls dependencies from your internal mirror.
Runs the hermetic build.
Signs the output with Cosign, using a Fulcio-issued OIDC cert tied to the CI runner’s identity.
Records the SLSA provenance attestation to your internal Rekor instance.
Publishes the signed artifact to your internal registry.

The CI runner has no internet egress. The same one-way data flow that applies to dev environments applies to CI. If a CI job needs to talk to an external service, that’s a code review failure, not a network policy failure — the policy makes it impossible by default.

The CI runner is also where you run your SBOM generation (Syft), vulnerability scanning (Trivy or Grype), and license compliance scanning. All of these tools are OSS and work fine air-gapped.

The transparency log

Sigstore’s Rekor is the transparency log. Every signing event is recorded in Rekor immutably, and anyone can verify that an artifact was signed at a particular time with a particular identity.

In an air-gapped environment, you run your own Rekor instance. It doesn’t need to talk to the public Sigstore Rekor (and shouldn’t). Your internal Rekor is the audit trail for all your build artifacts. If a developer ever needs to prove “yes, this artifact was built from this exact commit at this exact time by this exact runner,” the answer is in Rekor.

This is one of the genuinely beautiful things about Sigstore: the public transparency log is a feature, but you can run your own, and the architecture doesn’t require it.

Collaboration inside the perimeter

In an air-gapped dev environment, your developers can’t use Slack, can’t use Google Docs, can’t use GitHub.com, can’t use anything that requires internet egress. So you need internal equivalents.

Chat

Element (Matrix) is self-hostable, federated, and E2EE-capable. It’s the closest thing to Slack in OSS land. Mature.

Mattermost is the most “Slack-like” of the options. Good if your team already has Slack muscle memory.

Rocket.Chat is somewhere in between.

Pick whichever your team will tolerate. They all work. None of them is as polished as Slack. Deal with it.

Code review

GitLab self-hosted is the most feature-complete option. Built-in CI, built-in container registry, built-in package registry, built-in everything. If you’re starting fresh, this is what I’d pick.

Gitea is lighter weight than GitLab. Good for small teams.

Gerrit is Google’s code review tool. Very powerful, very steep learning curve, used by Android and Chromium. Probably overkill unless you’re at Google scale.

Pair programming

For most teams, tmate (a fork of tmux that creates a shareable SSH session) plus voice plus screen sharing is enough. Pair programming is rare enough that you don’t need fancy tooling for it.

VS Code Live Share is off the table: it relays through Microsoft’s Azure infrastructure, has no self-hosted option, and the protocol is closed. There are self-hostable third-party collaborative-editing projects if tmate-plus-voice ever stops being enough, but tmate is good enough.

Documents

CryptPad is self-hosted, E2EE collaborative editor. Good for documents.

OnlyOffice or Collabora for full Office document compatibility. Heavier to operate.

Wiki.js for internal documentation. Self-hosted, good UX.

None of these is “as good as Google Docs.” All of them are good enough.

Day-one onboarding

The test of any secure dev environment is: can a new developer be productive on day one? If the answer is “two weeks of provisioning and a half-day of training,” you’ve failed.

Here’s what day-one productivity looks like with this architecture:

The week before the developer starts, HR plus IT plus Security pre-provision a laptop. The laptop arrives with:

LUKS-encrypted disk, key sealed to TPM.
Measured boot enabled.
Keylime agent pre-configured with the new developer’s SPIFFE ID.
SSH client plus WireGuard client plus YubiKey pre-paired.
VS Code or JetBrains installed, configured to talk to the bastion.

Day one, 9 AM: the developer plugs in their YubiKey, unlocks the laptop with the PIN, opens VS Code.

Day one, 9:01 AM: VS Code shows the workspace catalog. The developer picks their team.

Day one, 9:02 AM: a workspace boots. SPIRE issues an identity. The secret broker hands over credentials.

Day one, 9:05 AM: the developer has a working dev environment with their team’s standard tooling, authenticated to the source repo, with access to the test database.

Day one, 9:30 AM: the developer has shipped their first commit.

This is achievable. Coder and Ona both support this kind of flow natively. The trick is to pre-build the standard workspace templates so the developer doesn’t have to assemble their environment themselves.

When things break at 3 AM

Everything fails sometimes. The datacenter loses power. SPIRE’s CA cert rotates and the agent caches go stale. The attestation verifier gets a config push that bricks it. Someone needs root access to a dev environment at 3 AM because production is down and they need to ship a hotfix.

You need a documented break-glass procedure. The non-negotiable properties of that procedure:

Fast enough that engineers will use it instead of routing around the security. If the break-glass takes four hours of paperwork, people will find a way to not use it. They’ll SSH from their personal machine, or they’ll sync the offending repo to their laptop and fix it there. That’s worse than no break-glass at all.

Requires dual control. No single person can break the glass alone. Two people with two YubiKeys, on a recorded video call, jointly approve. The “two-person rule” isn’t bureaucratic theater — it’s the only thing that makes break-glass survivable.

Is logged and audited. Every break-glass event triggers a high-priority alert to the security team. Every action taken under the break-glass identity is logged separately with cryptographic proof of who did what when.

Is time-limited. The break-glass credentials expire in one hour. They can’t be used indefinitely. The clock starts when the first operator authenticates and ends hard.

Is tested quarterly. If you don’t test it, it doesn’t work when you need it. Put a calendar reminder on the security team’s shared calendar. Actually run the procedure on a staging environment every quarter.

Concrete implementation: a separate, highly-secured “break-glass” SPIRE identity that’s only issued when two specific operators (with hardware keys and biometric auth) jointly approve. The break-glass identity has access to “operator mode” inside the dev environment, which can dump logs, attach a debugger, or in extreme cases decrypt and exfiltrate the workspace volume for forensic analysis.

Yes, this means a determined attacker who compromises two operators simultaneously can break the glass. That’s the point of dual control — you’re betting that compromising two operators at once is much harder than compromising one. So far, the bet has held up.

The honest costs

Let me be honest about what this architecture doesn’t give you.

Latency. Even with good engineering, you’re looking at 20 to 80ms of added latency for IDE interactions compared to local development. This is a real, tangible cost. It’s not enough to make people quit, but it’s enough that they’ll occasionally complain. The mitigations are geographic proximity, hardware-accelerated remote display, and allowing local non-sensitive dev environments for prototyping.

Self-hosted LLM quality. As I noted above, the self-hosted models are good enough but not as good as the frontier closed models. If your developers are building genuinely novel ML systems, your researchers will feel the gap. There is no good answer to this — the security constraints and the model quality are in tension. Pick one of the three options I outlined above and live with it.

Build performance. Hermetic builds are slower than non-hermetic builds in the common case, because you can’t take shortcuts via the host’s pre-installed packages. Remote build caches help a lot. But you’re not going to beat a well-tuned non-hermetic CI pipeline in raw speed. The trade is worth it: hermetic builds mean you can actually trust your supply chain.

Cultural cost. The biggest cost isn’t technical. It’s cultural. Your developers will chafe at not being able to use ChatGPT.com directly. At not being able to install random tools without going through the workspace template process. At having to wear a YubiKey on their keychain. At having to enter a PIN to unlock their laptop every time they step away.

This is real and you should not pretend it isn’t. The way you get through it is by making the secure path the path of least resistance — the workspace just works, the AI assistant just works, the IDE just works — and by being transparent about why the constraints exist. Cite real incidents in your industry. Show them the actual attack patterns. Most engineers, once they understand the threat model, will accept the constraints. Some won’t, and you should let those engineers go work elsewhere rather than weaken the architecture for them.

The talent problem. Operating this kind of architecture requires people who understand confidential computing (SEV-SNP / TDX / CCA attestation flows), Kubernetes operations, eBPF and Linux kernel internals, SPIFFE/SPIRE, build systems, and Sigstore and SLSA. This is a small pool of people. If you’re a small organization, you won’t be able to hire all of these full-time. The pragmatic move is to hire one or two people who understand three or four of these deeply and partner with a vendor — Red Hat, SUSE, Canonical, Microsoft, or one of the specialist consultancies — for the rest.

The bill

Here’s what you actually need to buy and run.

Hardware (capex)

For 100 concurrent developers: roughly $1M in capex. For 500: roughly $3M. These are ballparks and depend heavily on what you negotiate — and the GPU line dominates, so your real number tracks how many developers actually hit the AI assistant concurrently, not headcount.

The breakdown:

CPU pool: AMD EPYC 9355 servers at ~$15k each, 1 server per ~20 concurrent dev environments. So 5 servers for 100 devs, 25 for 500 devs.
GPU pool: H100 or H200 servers at ~$300k per 8-GPU node, 1 GPU per ~5 concurrent AI inference users (the same ratio as the hardware spec above). If half of 100 developers are hitting the assistant at once, that’s ~10 GPUs — two 8-GPU nodes with headroom, ~$600k. Size against measured concurrency, not headcount.
Storage: Ceph or Longhorn cluster on commodity hardware. ~$30k for a cluster that handles 500 concurrent workspaces.
Networking: 25 GbE switches. ~$20k for a leaf pair.
Bastion plus attestation verifier: Two small servers in HA. ~$5k.

Software (mostly free)

The total software cost for the OSS stack is essentially zero. You pay for the people who run it, not the licenses.

The full bill of materials: Kubernetes (any distro), Kata Containers, Confidential Containers, Trustee, SPIRE, OpenBao, Sigstore (Cosign, Rekor, Fulcio), Tetragon (eBPF runtime enforcement on hosts and workloads), Keylime, Coder or Ona, GitLab self-hosted, Bazel or Nix, vLLM or Tabby, Element or Mattermost.

That’s the whole stack. Every component is OSS and production-ready.

People (opex, the real cost)

1× platform engineer who knows confidential computing deeply
1× Kubernetes operator
1× security engineer who can own the attestation, SPIRE, Keylime, and DLP stack
0.5× SRE for oncall
1× ML platform engineer if you’re running serious AI workloads

That’s 4 to 5 people for a 100-dev deployment. Roughly $1.2M to $1.5M per year fully loaded — these are specialist skill sets, priced accordingly. Scale linearly up to maybe 300 developers; beyond that, you need more ops people and the platform folks scale sub-linearly.

Make the decision

Here’s what I want you to take away from all of this.

This architecture is buildable today. With OSS tools. By a competent team. In roughly 12 months. It is not science fiction. It is not a vendor pitch. It is not even particularly hard, once you accept that the constraint set is fixed and you need to engineer within it.

The reason most organizations don’t ship it isn’t that it’s impossible. It’s that no executive wants to fund it (the benefit is hard to attribute because the leaks you prevented are invisible), no developer wants to use it (the cost is visible every day), and no vendor will sell it to you cleanly (the big security vendors make their money on the threat of source code exfiltration, not on the solution that prevents it).

If you’re the kind of person who reads this far into an article, you’re probably the kind of person who can change that. The architecture exists. The tooling exists. The threat model is well-understood. What’s missing is usually just the decision.

Make the decision.

If you want to go deeper on the actual attestation mechanics — how SEV-SNP on AMD EPYC Turin actually proves a VM’s identity to a remote verifier, the key hierarchy, the RMP table, the Migration Agent policy bits — I wrote that up separately. The full deep-dive is here. It covers all five generations of AMD’s memory encryption and what you actually give up when you turn it on.