Armada vs SLURM: Two Schedulers, Two Theories of Who Owns the Cluster // Hunter Wigelsworth

There’s a question hiding inside every batch scheduler, and almost nobody asks it out loud: who owns the cluster?

If the answer is “we do, top to bottom, and nobody else gets to touch it,” you’re in SLURM country. If the answer is “the platform team owns the clusters, and we’re just tenants who need to run jobs somewhere,” you’re in Armada country. Both schedulers will cheerfully run your workload. Both will work fine until they don’t. The thing that determines which one is right for you has almost nothing to do with features and almost everything to do with that question.

I spent the last month reading source code, running test deployments, and picking apart how each one handles the workloads I actually care about — large-scale HPC, ML training, ETL pipelines, and the increasingly common hybrid where one team runs research compute and another runs inference serving. Here’s what I found.

The Two Bets

SLURM made its bet in 2002. The Lawrence Livermore National Lab needed to schedule jobs on a cluster of Linux servers it owned outright. The scheduler had direct, privileged access to every node. If you wanted to run a job, you talked to slurmctld — the central controller — and slurmctld told slurmd (the per-node daemon) what to do. There was no cloud, no containers-as-a-first-class concept, no concept of “I don’t own this hardware, I’m renting it from someone.” The cluster was a fixed pool of metal, and SLURM was the only thing that decided who got to use it when.

Armada made its bet in 2020. G-Research’s Jamie Poole wrote about it on the company blog — they had a fleet of on-prem servers running HTCondor on Windows, and they wanted to migrate everything to Linux containers on Kubernetes. They had the same workload pattern (millions of small-to-medium batch jobs per day) but a completely different operational assumption: the cluster wasn’t a thing they owned, it was a thing the Kubernetes control plane allocated to pods, and the scheduler had to live outside the cluster because talking to one cluster at a time was the wrong model. So they wrote a meta-scheduler that sits above Kubernetes, holds the global queue, and lets you run jobs across dozens of K8s clusters like they’re one big pool.

Both bets are defensible. Both projects are real, production-grade, and solve genuinely hard problems. But they encode different theories of what a cluster is, and if you pick the wrong one for your environment, you will spend the next three years writing glue code and quietly resenting everyone involved.

Why “Just Use Kubernetes” Doesn’t Work for Batch

Before I get into the schedulers themselves, I need to address the thing every Kubernetes fanboy says in the first five minutes of any batch conversation: “Why don’t you just use Kubernetes Jobs?”

I love Kubernetes. I’ve run production clusters on it. But Kubernetes Jobs are broken at scale in ways that should embarrass anyone who built them.

Here’s the deal. Kubernetes Jobs are a special kind of controller that creates one or more Pods and tracks their completion. Sounds fine. In practice, the architecture punishes you the moment you try to use it for what it’s named after.

The etcd bottleneck. Every Job object is stored in etcd. Every Pod created by a Job is stored in etcd. If you’re running a job that fans out into 10,000 Pods (perfectly normal for a Monte Carlo sweep or a parameter scan), you’re writing 10,000 objects into etcd. Etcd is designed for consistency, not throughput. Kubernetes issue #95492 documents exactly what happens: someone tried to run 50,000 Jobs, the controller choked around 3,000, and the rest just… didn’t schedule. The watcher backlog grew until the API server started timing out requests.

G-Research’s blog post calls this out directly. The original Kubernetes scheduler is great for long-running services. It is not designed to hold millions of queued jobs in memory and dispatch them fairly across users. It’s also not designed to do gang scheduling, which is the bit where you say “this job needs 64 GPUs that all have to come up at the same time, or the whole thing dies.” And the default kube-scheduler has no concept of fair-share across users, which means the loudest team in your org gets all the GPUs and the quiet team doing careful research work gets nothing.

The single-cluster limit. Kubernetes has a documented theoretical maximum of 5,000 nodes per cluster. In practice, you start hitting weird behavior around 1,000-2,000 nodes, and serious shops cap out around 5,000. If you want to run a fleet of “thousands of servers” as G-Research does, you cannot put them all in one Kubernetes cluster. You need to operate multiple clusters. And once you have multiple clusters, you have a meta-scheduling problem — which cluster gets which job? — that Kubernetes itself doesn’t solve.

This is the hole Armada was built to fill. It doesn’t try to fix Kubernetes Jobs. It sits above them, holds the global queue in its own storage layer (Pulsar + Postgres + Redis), and dispatches work to whichever Kubernetes cluster has capacity. The clusters themselves can run vanilla Kubernetes Jobs underneath, but the queue, the priority logic, the fair-share, and the multi-cluster routing all live outside.

What SLURM Actually Is

Let me walk you through SLURM the way I wish someone had walked me through it the first time I deployed it, because the architecture is simple once you stop thinking about it like Kubernetes.

There are four daemons. That’s it. Everything else is plugins and config files.

slurmctld — The central controller. There is exactly one active instance at any time, plus optionally a hot-standby backup. The controller knows about every node, every partition, every queued job, and every running job in the cluster. It does all the scheduling decisions. It’s a single C binary that’s been hardened over twenty years of production use at places like Lawrence Livermore, ORNL, and Argonne.
slurmd — Runs on every compute node. It’s basically sshd for batch jobs: it waits for work, executes it, reports status back to the controller. Lightweight, doesn’t need root, doesn’t do any scheduling itself.
slurmdbd — Optional, but you want it. A separate daemon that writes accounting data to MySQL or MariaDB. This is where fair-share gets computed. This is where usage limits per user, per account, per QOS get enforced. Without slurmdbd, you have no accounting and no real fair-share.
slurmrestd — Optional REST API. Newer than the rest of the stack (added in 20.02), and increasingly important because it’s the only sane way to integrate SLURM with anything that isn’t a Unix terminal.

User-facing tools are sbatch (submit a script), srun (launch a job step interactively), squeue (list queued jobs), sinfo (show cluster state), scancel (kill a job), and sacct (show accounting for past jobs). The syntax is famously terse. You will type #SBATCH --gres=gpu:h100:8 --cpus-per-gpu=16 --mem-per-gpu=200G --constraint=nvlink more times than you want to remember.

The real power of SLURM is the plugin architecture. The codebase ships with around 100 optional plugins. Want topology-aware scheduling that places jobs based on the fat-tree network? There’s a plugin. Want gang scheduling across an Infiniband fabric? There’s a plugin. Want burst buffers that pre-stage data before a job starts? There’s a plugin. Want real-time accounting that tells you which task in your MPI job is using 40% of the CPU? There’s a plugin.

For GPUs specifically, SLURM has what they call GRES — Generic Resources. The gres.conf documentation is exhaustive in a way that will make you feel either relieved or horrified depending on your tolerance for configuration files. You can declare GPUs of specific types (Gres=gpu:h100:8), request MPS instances for GPU sharing, request MIG slices (the nvidia.com/mig-profile equivalent lives in --gpu-bind and --gpus-per-task options), and pin jobs to specific NUMA nodes. There’s no other open source scheduler with that level of GPU topology control.

Container support is also first-class, which surprises people. SLURM 25.11 has native OCI container support — you can run srun --container-image=... and it will pull the image, generate an OCI bundle, and launch the job inside the container. The supported runtimes are runc, crun, Docker, Podman, Apptainer, Sarus, and most importantly for the AI crowd, NVIDIA’s enroot + Pyxis which is what every serious GPU cluster running SLURM actually uses in production because Pyxis hooks directly into the job launch and avoids the cold-start latency that ruins benchmark numbers.

Scale numbers, in case you’re wondering whether SLURM can handle your workload: the IBM Sequoia deployment ran 100,000 independent jobs across 100,000 sockets. The scheduler itself can absorb 1,000 job submissions per second and execute 600 jobs per second. These aren’t theoretical — Sequoia held the #1 spot on the TOP500 from 2012 to 2013, and SLURM was the scheduler. As of late 2021, SLURM was running about 60% of the TOP500. I haven’t seen a newer number, but there’s no evidence that share has dropped meaningfully — Frontier, Aurora, and the other exascale systems all run it.

The catch — there’s always a catch — is that SLURM assumes it’s the only thing that matters. The slurmd daemons need direct access to the nodes. The slurmctld needs to be able to SSH to them, or at minimum reach them on a configured port. You can’t run SLURM across cloud VMs that come and go. You can’t share nodes with Kubernetes workloads unless you’re very careful about cgroup isolation. And critically: SLURM doesn’t have a built-in concept of “this node belongs to a cloud account that’s about to be deleted.”

That’s the gap Armada fills.

What Armada Actually Is

Armada started life at G-Research as an internal tool. The team’s blog post (December 2020) lays out the architectural principles they committed to upfront, and they’re worth quoting because every modern batch-on-Kubernetes problem is downstream of these:

Write some software to add queuing and fair share, without needing to alter Kubernetes itself. Leave Kubernetes to do the hard work of node-scheduling and container lifecycle management. Support multiple clusters, such that we can scale past the limit of a single Kubernetes cluster and also gain the operational advantages of multiple clusters. Our aim is to run a fleet of thousands of servers. Use a pull-based model for obtaining jobs, to allow us to scale up easily.

The core architectural insight is the third one: out-of-cluster queuing and scheduling. Kubernetes’ etcd is great for storing the state of a running cluster. It is not great at being a queue that holds millions of pending jobs. Armada doesn’t try to fix that — it stores the queue itself, in its own storage layer, and only interacts with Kubernetes when it’s actually time to run a job.

The storage layer is built on Apache Pulsar. This is the most interesting design decision in the project, and it’s one most people don’t understand at first glance.

Armada treats its internal state as an event log. When a job is submitted, a JobSubmitted event gets published to Pulsar. When the scheduler decides where to run it, a JobScheduled event gets published. When an executor leases it and creates the pod, a JobLeased event gets published. Every state transition is an event in the log. The actual databases (Postgres for the scheduler’s queue state, Redis for fast lookups) are materialized views — they’re rebuilt from the log by subscribers.

Why does this matter? Because it gives Armada two properties that are very hard to get from a traditional scheduler:

Replayability. If you lose a database (and you will, eventually), you don’t lose jobs. You rebuild the database by replaying the log. The log is the source of truth, the databases are derived state, and the system converges back to consistency automatically.
Scalability without coordination. Because every state transition goes through the log, you don’t need schedulers to talk to each other about who’s claiming what. They consume events, make decisions, publish events. The log serializes everything.

The architecture documentation is explicit about the consistency model: “State transitions are published to the log in order” and “Databases are updated independently, ensuring eventual consistency.” This is the same architectural pattern used by distributed systems like Kafka Streams, EventStoreDB, and most modern databases that call themselves “event-sourced.” It works. It scales. It’s also more complex to operate than a vanilla relational database.

The control plane has four major subsystems:

Submit API — Authenticates the user, validates the job spec (which is just a Kubernetes PodSpec plus some Armada-specific metadata like queue, priority, and job-set ID), and publishes the submission event to Pulsar.
Scheduler — Runs periodically, looks at the queue and the available capacity across all clusters, makes scheduling and preemption decisions using a Dominant Resource Fairness (DRF) algorithm, and publishes the decision back to the log. This is where fair-share, priority, and gang scheduling all live.
Executor — One per Kubernetes cluster. Pulls job leases from the scheduler, creates the actual pods, monitors execution, reports completion. This is the only piece that talks to Kubernetes.
Lookout — The web UI. It’s its own subsystem with its own materialized view, which means the UI can’t slow down or block the scheduling path.

According to the CNCF project page, Armada is currently in the CNCF Sandbox with a Health Score of 82 (Excellent). It has 187 contributors from 49 organizations. The current release is 0.3.92 — yes, still 0.x in June 2026. The project’s GitHub has 599 stars. For comparison, SLURM’s GitHub has over 4,000 stars. The order-of-magnitude gap in adoption is the most important fact about Armada.

The Scale Numbers (And Why They’re Misleading)

Both projects publish impressive scale numbers. Neither set of numbers means what you think it means.

SLURM’s headline numbers: 100,000 nodes, 1,000 submissions/sec, 600 executions/sec, 60%+ of TOP500 supercomputers.

These are real. They come from the Sequoia deployment, which was the largest machine on Earth when it was built. The 1,000 submissions/sec number is from a benchmark published in 2010 and hasn’t been meaningfully retested since — but the architecture hasn’t fundamentally changed, and the controller is single-threaded for state mutations, so I’d expect similar numbers today.

The problem is what these numbers measure. SLURM is good at running a few thousand large jobs that each consume a lot of resources. The 100,000 jobs on Sequoia were MPI ranks across 100,000 sockets — one job, thousands of tasks, tightly coupled. The 1,000 submissions/sec is real but doesn’t tell you how many of those jobs can run concurrently. The answer is “as many as you have nodes for, up to the cluster size.”

Armada’s headline numbers: “millions of queued jobs per day across tens of thousands of nodes.” This is from G-Research’s production deployment.

The emphasis here is on queued. Armada’s value proposition is that you can submit way more jobs than you have capacity for, and the system will hold them in the queue and run them when resources become available. The “millions per day” number comes from G-Research’s specific workload pattern, which is high-throughput backtesting and quantitative research — lots of short jobs, many of them can run in parallel, and they’re all submitted by automated pipelines rather than humans.

This is the load pattern where Kubernetes gets into trouble. If your workload is “submit 10 million small jobs, run them as fast as possible across whatever hardware is available, don’t care which exact cluster runs which one,” Armada was designed for that and SLURM wasn’t.

But here’s the thing: if your workload is “submit 200 large jobs that each need 64 H100s wired together with NVLink and Infiniband, and I need every single GPU to be a specific type with a specific topology, and I need to checkpoint them every 15 minutes because they’re going to run for two weeks,” SLURM is what you want and Armada doesn’t really compete.

The Architecture Difference, Visualized

Here’s how the two systems are shaped:

SLURM is a hub-and-spoke. The controller (slurmctld) is the only brain. Every node daemon (slurmd) talks to it. Every job submission goes through it. If the controller dies, the hot-standby takes over (if you configured one), but during failover, no new jobs start and no decisions get made. The architecture is intentionally simple: one process that knows everything, many processes that do what they’re told.

The tradeoff is that the controller becomes a bottleneck as you scale. You can mitigate this by federating multiple SLURM clusters together (each with its own controller), but federation in SLURM is bolted on, not designed in. You get a “cluster-of-clusters” abstraction but not a unified queue.

Armada is a log-spoke architecture. The Pulsar log is the only brain. Every other component (Scheduler, Executor, Lookout, Submit API) is a subscriber that consumes events and publishes new ones. You can run multiple instances of every component for redundancy, and because they all communicate through the log, there’s no central coordination point.

The tradeoff is that you now have Pulsar (which is itself a distributed system to operate), plus Postgres, plus Redis, plus the Armada control plane components, plus the executors on every cluster, plus the Kubernetes clusters themselves. The operational surface area is enormous. If you don’t have a dedicated platform team that understands both Pulsar and Kubernetes, you’re going to have a bad time.

The Container Story

Both schedulers can run containers. They get there from completely different directions.

SLURM added OCI container support as a feature on top of an existing scheduler that was originally designed to launch processes directly. The result is excellent but complicated: you have to choose an OCI runtime (runc, crun, Docker, Podman, Apptainer, Sarus, or NVIDIA’s enroot+Pyxis), configure oci.conf on every compute node, make sure your container bundle is available on the node before the job runs (Slurm won’t transfer it for you — see the Containers Guide known limitations), and the whole thing runs as rootless user-namespace containers. That’s fine for security isolation but means you can’t do privileged operations inside the container.

In practice, serious HPC shops running ML workloads use enroot + Pyxis because it’s the only path that doesn’t double-mount the filesystem and ruin I/O performance on shared parallel storage. The integration is tight enough that you can srun --container-image=nvcr.io/nvidia/pytorch:24.06-py3 ... and it Just Works. But you need to be on a fairly recent SLURM (23.02+) to get the good Pyxis integration, and you need to be on the right kernel.

Armada doesn’t really have a “container story” because it is the container story. Every Armada job is a Kubernetes pod spec. The Executor takes that pod spec and submits it to Kubernetes via the regular Kubernetes API. Whatever Kubernetes can run, Armada can run — including privileged containers, host networking, GPU device plugins, sidecars, init containers, the whole kit. You get to use the same container images, the same registries, the same security policies, the same everything as the rest of your Kubernetes infrastructure.

If your organization already has a hardened Kubernetes platform with image scanning, admission controllers, OPA policies, and a CI/CD pipeline that builds and pushes images — Armada gives you all of that for free. SLURM makes you reinvent most of it.

The GPU Story

GPUs are the workload that matters most in 2026, so let’s talk about them specifically.

SLURM has the most mature GPU scheduling story of any open source scheduler. The GRES system supports:

Typed GPUs. You can have nodes with H100s and nodes with A100s in the same cluster and request specific types via --gres=gpu:h100:8.
MIG slicing. NVIDIA’s Multi-Instance GPU technology lets you split an H100 into up to 7 isolated instances. SLURM supports this natively via --gres=gpu:h100:1g.5gb or similar syntax. You can also pin jobs to specific MIG instance IDs.
MPS (Multi-Process Service). Lets multiple processes share a GPU. SLURM can manage MPS instances and limit how many processes each MPS instance serves.
Sharding. You can split a GPU across multiple users with hard memory limits per shard. Useful for inference workloads where one model uses half the GPU.
GPU binding. Control which GPU a task binds to (--gpu-bind=closest, --gpu-bind=per_task:<gpus_per_task>).
NUMA-aware pinning. srun --cpu-bind=... plus GPU topology means you can ensure your training process’s memory bandwidth stays on the local NUMA node of the GPU it’s using. This matters enormously for large-model training — if your CPU is on a different NUMA node than your GPU, you lose 30-40% of the interconnect bandwidth to QPI/UPI.

SLURM is also smart enough to know about NVLink topologies. The topology.conf plugin lets you describe the network as a graph, and SLURM will use Hilbert curve scheduling or switch-aware placement to put jobs that need to communicate on adjacent nodes in the topology. This is the difference between a distributed training job that achieves 80% of peak interconnect bandwidth and one that achieves 40%, and that 2x difference is the difference between a training run that finishes in 6 hours and one that takes 12.

Armada treats GPUs as opaque Kubernetes resources. You say nvidia.com/gpu: 8 in your pod spec and Kubernetes hands you 8 GPUs. Whatever the GPU operator’s device plugin exposes is what you get.

This works fine for the common case. It works less well for:

Heterogeneous GPU clusters where you need to request specific GPU models. You can do this with node selectors and taints/tolerations, but it’s clunky and requires the Kubernetes layer to expose the right labels.
MIG slicing. Kubernetes has experimental MIG support via the GPU Operator, but it’s nowhere near as battle-tested as SLURM’s.
NUMA pinning. The static CPU manager policy in Kubernetes can do this, but you have to configure it correctly per node, and most managed Kubernetes services don’t expose the knobs.
Topology-aware GPU placement. Kubernetes has the topology manager and DRA (Dynamic Resource Allocation) is improving this, but as of mid-2026 it’s still not as good as SLURM for tightly-coupled jobs.

If your workload is “give me 8 H100s on the same NVLink-connected node, please,” SLURM is better. If your workload is “give me 1 GPU and run a thousand inference jobs in parallel across the whole fleet,” Armada is fine and probably better because you can use Kubernetes’ existing autoscaling and bin-packing.

The Hybrid Path: Slinky, Soperator, and the Betrayal of Clean Abstractions

Here’s the part that doesn’t get talked about enough. The real answer for most production shops in 2026 is neither “all Armada” nor “all SLURM.” It’s “run SLURM on top of Kubernetes, so I can have SLURM’s scheduling maturity and Kubernetes’ operational substrate.”

There are two serious projects in this space now:

Slinky (SchedMD + NVIDIA)

Slinky is the official answer from the people who maintain SLURM. It’s two components:

slurm-operator — Lets you run a full SLURM cluster inside Kubernetes as pods. Login nodes, controller nodes, worker nodes, all as K8s StatefulSets. You describe your desired SlurmCluster in YAML, the operator reconciles the state.
slurm-bridge — The more interesting one. Lets you run SLURM as a Kubernetes scheduler plugin. The idea is that a single cluster can have both regular Kubernetes workloads (Deployments, StatefulSets, etc.) and SLURM workloads (batch jobs submitted via sbatch), and they share the same nodes. SLURM becomes the scheduler for a subset of pods; the default kube-scheduler handles the rest.

This is genuinely clever. It means you don’t have to pick. You can run your inference serving on Kubernetes with the default scheduler, and run your training jobs on the same nodes via SLURM, and the two coexist. Slinky is developed by SchedMD and supported by NVIDIA, which means it’s getting serious engineering investment.

The catch is that Slinky is new. It was first presented at SC24 in November 2024, with the slurm-bridge component getting more airtime through 2025. As of June 2026, the project is at 1.0.0-rc1 — release candidate, not GA. If you’re going to bet on this in production, expect to be one of the early adopters.

Soperator (Nebius)

Soperator is Nebius’s take on the same problem. It also runs SLURM inside Kubernetes as pods, but with a different twist: the “jail” PVC.

Here’s the problem with running SLURM in containers: SLURM expects every node to have an identical filesystem. If you install a package or change a config on one node, the others don’t see it. SLURM’s traditional answer is NFS or Lustre, but in a cloud-native world you want to use Kubernetes primitives.

Soperator’s solution is a shared PersistentVolume that’s mounted on every worker and login node. Inside that volume is a full Linux filesystem (could be a different distro than the host). Users see this shared filesystem as their root /. You can install a package on one node, and it shows up on all of them. You can write job output anywhere, and every node sees the same file. It’s a really elegant hack.

Soperator also has built-in DCGM exporter integration for GPU monitoring, automated ActiveChecks that periodically run NCCL benchmarks to validate cluster health, and a FluxCD-based deployment model. It’s less mature than Slinky but more opinionated about cloud-native patterns.

Which hybrid to pick?

If you’re a shop that already runs Kubernetes in production and you need SLURM for one specific team or workload, Slinky is the safer bet. It’s from the people who wrote SLURM, it has NVIDIA’s weight behind it, and slurm-bridge in particular is the right answer if you want true shared scheduling.

If you’re a cloud-native shop that’s never run SLURM before and you’re building from scratch, Soperator might be more approachable. The shared-root jail is a great developer experience. The FluxCD integration is what you’d want anyway. But it’s Nebius-specific tooling with a smaller community.

If you’re a traditional HPC shop that’s been on SLURM for 15 years and someone just told you “we’re moving to Kubernetes,” neither project is going to make you happy. You’re going to run SLURM on bare metal or on dedicated VMs, and you’re going to run Kubernetes separately for your application workloads, and the two will coexist but not actually integrate. That’s fine. It works. It’s what most of the TOP500 does.

What You’d Actually Build

Okay. Practical mode. You’re a director or principal engineer at a company that needs to run batch compute. You have a choice. Here’s how I’d think through it.

Pick SLURM if:

You have tightly-coupled jobs that need to communicate over high-speed interconnect (NVLink, InfiniBand, Slingshot). SLURM’s topology-aware scheduling is the best in the industry, and Armada can’t match it because Kubernetes doesn’t have the primitives yet.
Your workloads are long-running (hours to days). SLURM’s checkpoint/restart story and its graceful handling of node failures are what you need.
Your team is full of people who’ve used SLURM for a decade and don’t want to learn anything new. The inertia here is real and shouldn’t be dismissed.
You’re buying a TOP500-class machine and the vendor is going to install SLURM for you anyway. Frontier, Aurora, LUMI — they all run SLURM because that’s what the HPC vendors standardized on.
You need to schedule something that’s not a container. SLURM can launch any process. Armada is Kubernetes-only.

Pick Armada if:

Your workload is many small independent jobs, and you need to queue millions of them at a time. This is the load pattern Armada was literally built for.
You already have a Kubernetes platform team and they want one scheduler to rule all workloads. Armada gives you batch scheduling on the same substrate as everything else.
You need multi-cluster scheduling. SLURM federation exists but it’s a 2010-era feature with a 2010-era developer experience. Armada was designed multi-cluster from day one.
Your batch jobs are mostly embarrassingly parallel — parameter sweeps, hyperparameter searches, batch inference, ETL — and not tightly coupled.
You’re a fintech or quantitative research firm that already runs Kubernetes and needs to do high-throughput backtesting. This is G-Research’s exact use case.

Pick Slinky/Soperator if:

You want both. The HPC team gets SLURM’s mature scheduling. The platform team gets Kubernetes’ operational substrate. You share the hardware and the budget.
You’re willing to be on the bleeding edge. These projects are 1.0-rc1 and earlier-adopter territory.
You have a platform engineering team that can debug both Pulsar-class distributed systems issues and Linux kernel cgroup behavior. If you don’t have that, you’ll be miserable.

Don’t pick anything if:

You’re running fewer than 100 nodes and fewer than 10,000 jobs per day. Just use a queue in Redis and a Kubernetes Job. You don’t need a meta-scheduler.
Your “batch workload” is really a periodic cron job. Use Kubernetes CronJob. It’s fine.
You think you need to pick today and lock in for ten years. You don’t. Migration paths exist. The hybrid projects mean you can run both schedulers on the same cluster.

The Thing Nobody Tells You About Adoption

Adoption is not a feature checklist. Adoption is what determines whether your bug reports get answered in 2026 or sit in a GitHub issue for two years.

SLURM has been in production at national labs and Fortune 500 research shops since 2002. SchedMD is a real company with paying customers. There’s a deep ecosystem of vendors (HPE, Dell, Lenovo, Supermicro, Atos/Bull) who will install and support it for you. The commercial support contract is expensive but it works. There’s a 20-year-old Stack Overflow archive, a Slack community, a yearly user group meeting, and thousands of HPC sysadmins who already know how to operate it. When you hit a weird bug, someone has hit it before and there’s a forum post about it from 2014.

Armada has G-Research. That’s basically it for production deployments, plus a handful of fintech and biotech shops that have public blog posts. The project is in CNCF Sandbox, which is the lowest maturity tier — Incubating comes after Sandbox, then Graduated. The 599 stars and the fact that the project is still on 0.x versioning (0.3.92 as of June 2026) tells you that adoption is real but limited. The contributors are committed, the architecture is sound, the G-Reference customer reference is impressive. But you’re not going to call up an HPE account manager and get someone to come install it for you.

If you’re betting your company’s compute platform on this decision, that asymmetry matters. SLURM is a known quantity. If something breaks, you can hire someone to fix it. Armada is a smart bet for the future but a real risk for production today.

What I’d Actually Pick

Let me make this personal. If someone came to me tomorrow and said “we need to pick a batch scheduler for a 500-node hybrid ML/HPC cluster and we have $2M to spend on it,” here’s what I’d tell them:

For the ML training side of the house: SLURM, deployed on bare metal via Slinky. Slinky gives you the SLURM maturity and the Kubernetes substrate at the same time. You get topology-aware scheduling for the tightly-coupled training jobs (the ones that actually need NVLink awareness), and you get the operational benefits of running everything as Kubernetes pods.

For the inference and ETL side of the house: whatever Kubernetes scheduler you already have. You don’t need Armada here. You need a vanilla Kubernetes deployment with the GPU Operator and HPA/KEDA for autoscaling. If you need queueing across many small jobs, Kueue will do it. Volcano if you have gang-scheduling needs for things like TensorFlow or PyTorch distributed jobs.

For the high-throughput backtesting/quantitative research side: Armada, if your org is already Kubernetes-first. This is the workload Armada was built for. The tradeoffs (Pulsar operational complexity, limited adoption, immature ecosystem) are real but they don’t matter if you have a Kubernetes platform team that’s already operating Pulsar and you have a workload pattern that genuinely needs to queue millions of jobs.

For everything else: just use Kubernetes Jobs and stop overthinking it. The etcd bottleneck is real but you won’t hit it unless you’re at significant scale. The Gang scheduling limitation is real but most jobs aren’t actually gang-scheduled — they’re just embarrassingly parallel and don’t care. The fair-share limitation is real but you can solve it with namespace quotas and priority classes. Half the teams I see reach for a meta-scheduler could solve their problem with a Terraform module and a kubectl apply.

The Quiet Betrayal

Here’s the thing that’s been bugging me while I write this.

Both schedulers are, fundamentally, a bet about the future of compute infrastructure. SLURM bet that clusters would always be self-contained things you owned and operated end-to-end. Armada bet that clusters would always be Kubernetes clusters operated by someone else, and your job as the workload owner was to be a good tenant.

In 2020, when G-Research wrote the original Armada blog post, both bets looked defensible. SLURM was winning HPC. Kubernetes was winning everything else. The question was whether the two worlds would ever need to integrate.

By 2026, the answer is “yes, but not the way anyone expected.” The TOP500 still runs SLURM because that’s where the public funding is and the vendor ecosystem supports it. But the actual growth in batch compute is happening in two places: AI labs running multi-thousand-GPU clusters, and fintech/biotech shops running millions of small jobs on Kubernetes. The first group is starting to use Slinky. The second group is starting to use Armada.

Neither project is winning. They’re both winning their respective niches. And the “Kubernetes-native everything” narrative that Armada represents is being undermined by the fact that the AI training workloads — the ones with the most money and the most GPUs — keep rediscovering that Kubernetes wasn’t actually designed for them, and that SLURM’s mature GPU topology story is hard to replicate.

The honest answer is that I don’t know which scheduler will be dominant in 2030. I do know that the substrate question — who owns the cluster — is the question that matters. Pick the scheduler that matches your answer to that question, not the one that has the slickest demo.