You know that feeling when you’re reading a spec, see “reference implementation: Python” and “reference implementation: JavaScript”, and think “huh, nobody bothered with Rust for this”?

That was me reading the XARF v4 spec.

XARF — the eXtended Abuse Reporting Format — is a JSON schema for describing abuse incidents. Spam. DDoS. Phishing. Malware. Copyright violations. Compromised servers. Botnets. All the things that keep abuse desks awake at night, but in a machine-readable format that two organizations can actually exchange without someone sending a PDF attachment.

The spec targets sub-millisecond processing per report. The reference implementations are Python and JavaScript. And somehow, nobody thought to write it in Rust.

So I did.

What is XARF, exactly?

If you work in security operations or run an abuse desk, you’ve probably dealt with abuse reports. They come in all shapes: email bodies, PDF attachments, web forms, sometimes actual structured data if you’re lucky. XARF is what happens when a bunch of people who process abuse reports for a living decided to standardize the format.

It’s JSON. It’s schema-validated. It covers seven categories — messaging, connection, content, copyright, vulnerability, infrastructure, and reputation — with 32 specific subtypes. Each subtype has its own schema defining exactly which fields are required, which are recommended, and which are optional.

The spec lives at xarf.org with the full v4 specification at github.com/xarf/xarf-spec. The format is designed for automated exchange between organizations: your spam filter detects something, generates a XARF report, and sends it to the upstream provider’s abuse desk. No human reads it. It’s all schema validation and automated routing.

Why Rust?

Fair question. The Python reference implementation exists. JavaScript exists. Why add another?

Because when you’re processing abuse reports at scale, the performance characteristics of your parser matter. I’m not talking about “needs to not be terrible” — I’m talking about the spec itself says “typical report processing should complete in under 1ms.” The Python implementation is fine for a few reports a minute. When you’re in a high-throughput pipeline processing thousands of reports per second, Rust’s zero-cost abstractions and compile-time guarantees become genuinely useful.

But more importantly: Rust is the language I use. When I need a tool for something, I write it in Rust. It’s not a statement about Python being bad — it’s just that Rust gives me the performance, safety, and ergonomics I want for systems-level work.

How it works

The crate is called xarf-rs and it does three things: parses, validates, and generates XARF v4 reports.

The parsing path is straightforward:

use xarf::{parse, Report};

let result = parse(r#"{"xarf_version": "4.2.0", ...}"#).unwrap();
if result.errors.is_empty() {
    let report: Report = result.report.unwrap();
    println!("{} / {}", report.category.as_str(), report.type_);
}

But the interesting part is what happens under the hood. All 34 JSON schemas (the master schema, the core schema, and 32 type-specific schemas) are embedded into the binary at compile time using include_str!. Zero filesystem access. Zero network calls. The compiled jsonschema::Validator is cached in a OnceLock on first use — the first compile takes about 2ms, and every subsequent parse is a pointer dereference.

The whole parse path for a typical spam report (~500 bytes) runs in about 9 microseconds. That’s not a benchmark artifact — that’s wall-clock time on commodity hardware.

For building reports, there’s a builder pattern:

use xarf::{ReportBuilder, Contact, create_evidence};

let evidence = create_evidence("text/plain", b"original spam payload");
let report = ReportBuilder::new("messaging", "spam", "192.0.2.1")
    .reporter(Contact::new("Acme", "[email protected]", "acme.example"))
    .sender(Contact::new("Acme", "[email protected]", "abuse.example"))
    .source_port(25)
    .extra("protocol", json!("smtp"))
    .extra("smtp_from", json!("[email protected]"))
    .add_evidence(evidence)
    .build()
    .unwrap();

The create_evidence function handles the boring parts: computing the hash (SHA-256 by default, but you can pick SHA-512, SHA-1, or MD5), base64-encoding the payload, and recording the size. All in one call.

The v3 compatibility thing

XARF v3 existed before v4. It had a completely different JSON structure — Version, ReporterInfo, Report keys, the whole thing. The v3 format is deprecated but still out there in the wild.

The crate auto-detects v3 reports and converts them to v4 on the fly. It’s not a perfect conversion — some fields don’t map cleanly — but it’s good enough that you can process legacy reports without writing your own migration code. The conversion runs in about 5 microseconds and surfaces a deprecation warning so you know something old just walked through your pipeline.

Validation modes

Three modes, because one size never fits all:

  • Standard (default): required fields enforced, recommended fields ignored, unknown fields surface as warnings
  • Strict: recommended fields promoted to required, unknown fields become errors
  • Show missing optional: populates an info field with every absent recommended or optional field — useful for review tools that want to tell humans what they forgot to fill in

The strict mode does something clever: it deep-copies the master schema once at startup and promotes every x-recommended: true property into its parent’s required array. This matches the algorithm described in the v4 implementer’s guide exactly. No custom validation logic — the JSON Schema does all the work.

Architecture decisions

The core design choice was deliberate: rather than encode all 32 concrete report subtypes as a Rust enum (which would force compile-time knowledge of every category-specific field and lock the crate to one frozen version of the spec), the report is modeled as a single Report struct with strongly-typed core fields plus a BTreeMap of category-specific “extra” fields preserved verbatim from JSON.

This means forward compatibility. New fields appear in the spec? The crate keeps round-tripping them without any code changes. The schemas are the source of truth, not the Rust types.

The BTreeMap also means deterministic serialization — re-serializations are always in the same order. Useful for testing and for anyone who cares about reproducible output.

Testing

115 tests across 11 test binaries. Golden parser tests against the shared xarf-parser-tests corpus (44 valid + 5 invalid samples) plus the 32 canonical samples from the spec. Snapshot tests with insta. Async compatibility tests verifying Send + Sync across tokio boundaries. Performance budget tests with 5-180x headroom.

CI runs benchmark comparison on every PR and fails if anything regresses by more than 10%. There’s also a local criterion baseline for fine-grained optimization work.

Three layers of defense against performance regressions. Because the last thing you want is a subtle change that turns a 9-microsecond parse into 9 milliseconds.

What’s missing

It’s production-quality already — published on crates.io, full CI, release-plz automation, MSRV 1.86, edition 2024. But there are a few things that would make it better:

  1. A CLI tool — nothing for quick validation or generation from the command line. xarf validate report.json would be nice.
  2. Real-world examples — the README has API examples but no “here’s how you’d use this in an actual abuse pipeline” walkthrough.
  3. More documentation — docs.rs exists but could use a proper getting-started guide with more examples.

The bottom line

xarf-rs is a well-architected, thoroughly tested, high-performance Rust implementation of the XARF v4 spec. It fills a genuine gap in the ecosystem and matches the reference implementations’ behavior.

The spec says sub-millisecond processing. The crate delivers single-digit microseconds. Sometimes writing a library in Rust isn’t about showing off — it’s about the spec literally asking for performance that only Rust can deliver without breaking a sweat.

github.com/wiggels/xarf-rs