nix icon indicating copy to clipboard operation
nix copied to clipboard

`json-meta://` store

Open Ericson2314 opened this issue 2 years ago • 13 comments

Motivation

Way back in c1a07f94451cfa93aa9ac986188d0e9a536e4b9f Nix switched to using SQLite, and for good reason: it makes a lot of operations much faster. But it is still useful to have a simple text file metadata alternative for a few niche use-cases:

  • Broadcasting: if one wants an store over NFS to a very large number of consumers in a pub-sub manner, it is disadvantageous for all writes to modify the the database file. Separate files per separate "rows" avoids synchronization

    • IPFS, the janker way: No SQLite, no .nar is a quick and dirty way to get a filesystem store representation that a tool like IPFS could mirror pretty well. Of course, I am personally fond of a deeper integration where we do things like

      • make references actually point to other valid path info objects
      • make valid path info objects actually point to the data they own

      by persisting the JSON into IPFS's native JSON representation rather than just JSON-in-a-file, but I suppose it is good to not let the perfect be the enemy of the good.

  • High Security Stores: It is hard to audit the contents of a SQLite database. Even if we can be sure that the SQL program only reads out good data, since it is an opaque binary format there may still be opportunities for stenography. More broadly, database performance is in fundamental tension with restricting to normal forms. A store that has everything in plane text is easy to hand-audit, and therefore better suited to be a secure (albeit slow) store for various purposes.

Separate from the feature itself, I also think this is a good exercise to disentangle a store being "local" from a store using SQLite. Having a second tiny implementation ensures we don't start "over fitting" to SQLite in various ways, e.g. encouraging factoring out the parts of LocalStore that don't have to do with SQLite.

This is a little toy store that stores ValidPathInfos and Realisations in JSON format in a separate directory. This is an on-disk format that is very easy to work with (easier than narinfo line format, I think).

Context

TODO tests, but this is a general problem for more store implementations that we need to solve properly once at for all. E.g. #9429 has the same issue.

Priorities

Add :+1: to pull requests you find important.

CC @raitobezarius @flokli @ryantm @danielfullmer

Ericson2314 avatar Dec 06 '23 19:12 Ericson2314

Before Nix used SQLite for the Nix store (and after it used Berkeley DB), it used flat files to store metadata (b0e92f6d474ce91d7f071f9ed62bbb2015009c58). Apart from performance and disk space overhead, the main problem was ensuring transactional semantics for the referrers mapping (especially needed for garbage collection). But that might not be a problem for some use cases.

High Security Stores: It is hard to audit the contents of a SQLite database.

I'm not convinced by this argument, since SQLite is one of the most-used and best-tested pieces of software out there. Certainly better tested than an ad hoc metadata store, even if it contains JSON.

edolstra avatar Dec 06 '23 19:12 edolstra

The main problem was ensuring transactional semantics for the referrers mapping (especially needed for garbage collection

Yes, I didn't implement any of that, on purpose. References are just stored "forward" as part of the valid path info JSON. This makes adding new store objects easy / well isolated, and everything else a pain in the ass :). Very intentional!

I'm not convinced by this argument, since SQLite is one of the most-used and best-tested pieces of software out there. Certainly better tested than an ad hoc metadata store, even if it contains JSON.

Yes no hate against SQLite, it is fantastic software. My argument is not that SQLite could be better, but that a higher performance database of that sort must unavoidably sacrifice having a nice normal form. This however does have a normal form (packaged json, in order, assuming we didn't screw anything up). That has some nice problems SQLite can not have.

(An in-kernel SQLite, where it was impossible to see the underlying bytes but just use the abstract interface, would also help. Maybe someday we'll have Nix on SQL-supporting mainframes and can do thing that way. But this also just kicks the can down to "what is the best way to have secure on-disk representations of kernel data structures", which is a question the likes of dm-verity, https://github.com/project-machine/puzzlefs, squash-fs, etc. are all trying to answer.)

Ericson2314 avatar Dec 06 '23 20:12 Ericson2314

  • We have 20 SQLiteStmt instances in libstore, so factoring out a persistence layer does not seem far fetched.

  • When you're already doing network file systems, having a proper database server might not be a bad idea?

roberth avatar Dec 06 '23 20:12 roberth

factoring out a persistence layer does not seem far fetched.

I am hoping to get there little by little via the various "shuffle around the Store hierarchy" projects I have in flight. :)

Ericson2314 avatar Dec 06 '23 21:12 Ericson2314

This pull request has been mentioned on NixOS Discourse. There might be relevant details there:

https://discourse.nixos.org/t/2023-12-08-nix-team-meeting-minutes-110/36721/1

nixos-discourse avatar Dec 11 '23 13:12 nixos-discourse

This sounds similar to me do the narinfo without nar store idea that has been floated around. What is the benefit of this over that?

arianvp avatar Dec 11 '23 14:12 arianvp

Seems pretty much the same? I don't like the narinfo format and rather use JSON, but that's small potatos.

Ericson2314 avatar Dec 12 '23 21:12 Ericson2314

Cool. I guess a "benefit" of reusing .narinfo is that you could have an S3 bucket that is both a binary cache store and a file-store by both having a <hash-a>.nar file and a <hash-a>/ directory accompanied by a <hash-a>.narinfo file. Then you could use that bucket either as a substitutor or as a store (e.g. by using Mountpoint for S3).

arianvp avatar Dec 14 '23 16:12 arianvp

@arianvp But I would like to have binary caches also use the JSON format :). We can upload both to binary caches for backwards compat.

For example, at some point I need to propose what I think is a stronger version of the narHash field for checking not just the file system object closure of a single store object, but an entire store object closure. This would involve giving every reference a hash, so we have a "store path -> hash" map for the references. With the current narinfo format I would be just making up a new syntax, with JSON it's already provided for me.

Ericson2314 avatar Dec 14 '23 16:12 Ericson2314

Yes, we should allow .narinfo files to be JSON. I.e. if the first character is {, then parse it as a JSON object.

edolstra avatar Dec 14 '23 17:12 edolstra

That works too, if we are OK with suddenly cutting off old Nix from new objects :)

(or rather the read side can be flexible like that, but the write side can do two files. Postel's law type stuff.)

Ericson2314 avatar Dec 14 '23 17:12 Ericson2314

"narinfo" is not descriptive. It contains a bit of nar "info", such as the hash, but the rest is store object info, such as name and references, binary cache info such as file location, and realisation info such as deriver and signatures (if that's not its own category).

Query behavior can be modified in the nix-cache-info file, so client behavior can be guided if needed - e.g. to make it query only one of the two possible files.

we should allow .narinfo files to be JSON.

This can't be the main mechanism, because it's not compatible with existing Nix versions. We should have a transition period where both formats are available. A new file extension helps with that.

Wouldn't hurt to have docs (EDIT: of narinfo and the binary cache protocol at the very least) before considering any of this, so everyone can have a good understanding of the domain.

EDIT: Consider HTTP Accept:, although the protocol should function quite well with a simple server.

roberth avatar Dec 14 '23 18:12 roberth

draft https://github.com/NixOS/nix/pull/9348 adds some docs

Ericson2314 avatar Dec 14 '23 18:12 Ericson2314