nix Replacing nlohmann::json with a much faster alternative

Is your feature request related to a problem?

If you consider performance a problem then yes!, else no.

Within nixpkgs, theres ~900k lines of JSON across 740+ files, from nix develop to the store, JSON is used throughout the entire system, making JSON generation and parsing performance critical.

Proposed solution

nlohmann::json is a staple in the C++ JSON world. But Nix has grown far larger than the everyday pet projects that it was designed to support. Many newer, much more performant JSON parsers and generators have been developed in order to combat this problem, many of which are used every day in the private sector. C++ JSON benchmark: https://github.com/miloyip/nativejson-benchmark

Alternative solutions

Instead of a full scale rewrite, wrap one of these high performance JSON implementations with an API that more closely resembles that of nlohmann::json.

Additional context

Checklist

[x] checked latest Nix manual (source)
[x] checked open feature issues and pull requests for possible duplicates

Add :+1: to issues you find important.

Dec 04 '25 21:12 wyattgill9

The main question is: since eval reproducibility is paramount we must ensure that a possible replacement (simdjson comes to mind and I'm pretty sure there was an issue about that at some point) behaves identically in all observable aspects. That would be a non-trivial effort with mandatory fuzzing surely.

Dec 04 '25 21:12 xokdvium

https://github.com/stephenberry/glaze Is a C++23 solution, that claims to be faster than simdjson & rapidjson, with a simpler API, any of the following is worth looking into

Dec 04 '25 21:12 wyattgill9

Glaze is fuzz tested internally and by Google OSS-Fuzz. I'm the main developer of Glaze and would be happy to provide advice if desired.

Glaze guarantees round trip consistency with robust unit testing.

Concrete structs are the fastest to work with, but glz::generic is still very performant and likely many times faster than what you are using now.

I'm happy to answer more questions about needed features or code conversion for this project.

Dec 10 '25 00:12 stephenberry

Glaze is fuzz tested internally and by Google OSS-Fuzz

The fuzzing question is more about 100% byte-for-byte equivalence with nlohmann::json for deserialisations and serialisations. Quite a bit of JSON outputs can be part of store path hash calculations and if any change happens we must be sure that none of it is observable in the language frontend. nlohmann is pathologically stupid about numbers when it comes to parsing floats and such. It can be a huge compat footgun. IIUC circa 2.3 Nix has switched over from a hand-rolled parser for JSON to nlohmann and we are still dealing with the fallout.

Dec 10 '25 00:12 xokdvium

@xokdvium thanks for the clarification. That makes sense. Glaze, like std::to_chars/std::from_chars is locale independent and uses the shortest exact representation. Rounding is another discussion, but Glaze uses fast_float and dragonbox algorithms, while confirming to the JSON spec.

I can look over nlohmann's code and see what it would take to match its number serialization exactly.

Dec 10 '25 01:12 stephenberry

A critical question: As long as Glaze parses the JSON with binary equivalence in the resulting JSON elements (strings, numbers, etc.) to what nlohmann parses, would this be sufficient? Glaze would maintain this binary equivalence round tripping.

This would be significantly easier to achieve than requiring the JSON output to be binary equivalent between Glaze and nlohmann, which I initially was thinking, but now realize the binary equivalence may only be required within the C++ for the JSON elements.

Dec 10 '25 02:12 stephenberry

@xokdvium & @stephenberry I want to make sure I understand the exact requirement here. Does Nix need:

Semantic equivalence: same parsed data (strings, numbers, structure), even if the JSON formatting changes?

Or

Exact equivalence: the JSON output must match nlohmann::json byte-for-byte?

This would help determine if a better backend is feasible.

Dec 10 '25 05:12 wyattgill9

I don't have the experience with the codebase to answer this question, but if binary equivalence is only required for the parsed JSON then it makes migration trivial versus extremely complex.

Dec 11 '25 09:12 stephenberry

Unfortunately it's the other way around. Serialised json that's produced from values in the nix language quite often ends up in derivation files which get hashed when calculating the output store paths when __structuredAttrs is used. (Yes, relying on binary stability of nlohmann is also very bad, but hasn't broken in practice yet). That is particularly fraught when it comes to floating-point. It's a wonder there haven't been issues with it or not enough people actually make use of floating-point values.

Dec 11 '25 17:12 xokdvium

@xokdvium

If I understand correctly, we cannot change the bytes of the JSON Nix outputs, because that would regenerate every store path in the ecosystem.

Instead why don't we use a faster JSON lib (like glaze) internally for parsing and DOM manipulation. We could then reverse engineer nlohmann::json's output format, and write a conversion function as a bridge between the internal structure and one that matches the current nlohmann::json structure. Another option would be to, just write a function that converts it to nlohmann::json internal structure and then we can just call .dump(), preserving nlohmann's byte for byte equivalence.

Dec 11 '25 18:12 wyattgill9

You might be able to speed up the performance of parsing by using another library and then copying the C++ structure like glz::generic into nlohmann::json for serialization. But, this feels like a complex hack around a design that is quite fragile. For example, I know std::to_string in C++ is moving to have the same output as std::format for floating point. If nlohmann::json is using std::to_string or any other C++ functions that are being upgraded, then future C++ versions could break this codebase, which seems pretty bad to require an old version of C++ for perpetuity.

I would recommend considering how to move away from depending on the binary format of JSON serialization before looking into optimization. This is certainly a harder issue to fix, but one that will matter more for stability.

Dec 11 '25 18:12 stephenberry

I personally feel like allowing floats in structured attrs is more or less a mistake, and one that we can perhaps back out of. That would help with this.

I don't want to privilege nlohmann::json to much either.

Dec 11 '25 19:12 Ericson2314

@xokdvium @stephenberry If what Stephen says is true, either a rebuild is imminent (due to std::to_string changes), or we must pin NixOS/nix to a pre-C++26 version. Neither of which is a good idea. Hacking together a system were we have a faster parser internally (via Glaze or some other optimized JSON lib), and a way to output nlohmann::json is like Stephen said, a very hacky fix to a bigger internal issue. Why don't we look into just replacing all JSON outright? It could act as an optional feature for those who want the performance gain.

Dec 16 '25 03:12 wyattgill9