n2 Subninja support and multithreaded loading

Fixes #109 Fixes #108

This PR ports over most of the enhancements that are in Android's fork of ninja to n2. N2 goes from taking ~16 seconds to parse the AOSP ninja files, to ~1 second. (at least on my work computer where available_parallelism returns 128) It still seems to be about ~0.1 seconds slower than the android fork of ninja, but it's at least in the range where it's a little difficult to measure.

A general list of the changes in this PR:

Add support for subninja, with scoped variables.
mmap input files instead of reading them into memory.
Remove the FileIds and the Vec of files, replacing them with Arc<File>. Having to maintain a separate vec mapping FileIds to files is too expensive.
Use the DashMap crate to map from filename to File object, so that id_from_canonical can be called in parallel.
Use rayon for multithreading.
Split input files into chunks and parse the chunks in parallel
Parse subninjas in parallel. Subninja support was implemented because regular includes cannot be processed in parallel.
Revamp the evaluation of variable assignments. Now, every statement in the ninja file gets a "scope position". When evaluating global variables, they are evaluated relative to a certain scope position, so you can evaluate the variable at a particular point in the file, even if it is reassigned later.
Introduce "clumps" of parse results, as returning all the parsed statements in a flat vector causes too many large vector concatenations.
Box Build objects, as they are quite large, and Vecs of unboxed builds are too large and slow to concatenate.
Don't evaluate all the build bindings at load time. Only evaluate the input/output files, and defer evaluation of other bindings until the build is actually run.
Evalstrings are now just regular strings instead of Vecs of strings. This means when they're evaluated they're re-parsed. Doing it this way saves of the Vec allocations, and in the case of owned evalstrings, does one big string allocation instead of a bunch of little ones. There are a lot more owned evalstrings now due to deferred evaluation of build bindings, so it makes a bit of a difference.

One thing that the android fork has that I didn't include was precomputed hashes. The android fork has a HashedString and HashedStrView type that it uses to ensure hashes aren't recalculated. This is somewhat difficult to do in rust, you have to either make concessions about being able to look up map values by their borrowed representation, use the raw entry map apis on nightly rust, or use hashbrown's RawTable. And even when I was experimenting with RawTable it didn't seem to provide a noticeable speedup.

Some of these optimizations are memory optimizations. This turned out to be necessary because apparently the exit_group syscall that quits the process spends time cleaning up memory. The exit_group syscall was taking over a second to run before some of these optimizations.

I've also bumped the rust version needed to compile n2 from 1.70 to 1.73, to be able to use usize::next_multiple_of. But if we want to keep the rust version low I can just manually write a next_multiple_of.

Feb 28 '24 02:02 Colecf

CI is passing now.

Feb 28 '24 19:02 Colecf

Wow, impressive! I was just wondering this weekend whether you were still tinkering in this area...

Some minor thoughts I had while skimming this:

I wonder if an arena would help with the all the alloc/dealloc traffic, in particular parsing creates a zillion Builds and we never want to free any except all at once
re scope positions, I don't understand what you've done yet but in ninja we had each scope contain a pointer to its immutable parent scope, so in a series like a = ...; { b = $a; } a = ... the nested bit in the middle would see only the first value of a
have you looked at the perf of how quickly it starts on build tasks? I imagine your 10% is just parse-time perf, and I'm curious whether n2's hash-based dirty checking impacts you

Feb 28 '24 22:02 evmar

I did briefly look into bumpalo, but didn't use it because dashmap and stdlib hashmaps don't support custom allocators. Also bumpalo doesn't support multithreading directly, you have to create one arena per thread (bumpalo_herd). However the hashbrown crate supports them, so maybe that's worth trying... I did use std::mem::forget on the graph at the end of the program so we don't spend time deallocating it.
Yes, I think that's essentially the same thing. Though now with the multithreading every variable assignment, rule, build, default, and subninja gets a scope position so all these things can be evaluated in parallel, not just subninjas.
I haven't profiled build task start time, that might be worth doing but it should be much smaller than the parse time. I'm not sure what 10% you're referring to though.

By the way, did you ever get the copy of the android ninja files I shared with you after you requested them in https://github.com/evmar/n2/pull/94? You should have a shared google drive zip.

Feb 29 '24 00:02 Colecf

By the way, did you ever get the copy of the android ninja files I shared with you after you requested them in https://github.com/evmar/n2/pull/94? You should have a shared google drive zip.

Thanks! I grabbed a copy onto my Google Drive.

I haven't looked into the rest of this PR yet, sorry. :( It seems pretty fancy though, do you think you'll use it for Android going forward?

Mar 05 '24 19:03 evmar

I haven't looked into the rest of this PR yet, sorry. :( It seems pretty fancy though, do you think you'll use it for Android going forward?

Do you mean this PR or n2 in general? For the PR, yes, we definitely need some kind of load speed improvement, 16 seconds to start ninja would cause riots.

For n2 in general, I've started the process of forking it into android, but there are still a number of features that need to be added. I'm hoping to switch to it soon because there are some other teams who either want to add more ninja features, or who have already made their own forks with new features and are now contemplating merging their code back into the android fork. (e.g. abfs, a filesystem that communicates the files read by actions back to the build system) I also want to implement action sandboxing. I think it would be easier to scale to all these new features if they were written in rust as opposed to C++.

Mar 05 '24 19:03 Colecf

I am glad it is working out! I agree that working from Rust will hopefully make it easier to extend. One of my goals with n2 was to make it separable enough that it's hopefully easier to reason about.

BTW I think you'll need #50 if you want to use n2 in production. Or maybe even consider using a real database, I dunno -- the load time is in the critical path so I'm kind of fond of my tricky hack.

In any case I'd be happy to give advice on any pieces you end up needing to interact with, feel free to email about them or file bugs. (In particular I have some ideas on how n2 might implement content hashing for files, which is useful both for dirty checking but also for interacting with cloud build systems, and how that can nicely fit into n2's model.)

Mar 05 '24 20:03 evmar

BTW I think you'll need https://github.com/evmar/n2/issues/50 if you want to use n2 in production. Or maybe even consider using a real database, I dunno -- the load time is in the critical path so I'm kind of fond of my tricky hack.

Yeah I'm aware of that issue. Though what "tricky hack" are you talking about?

In any case I'd be happy to give advice on any pieces you end up needing to interact with, feel free to email about them or file bugs.

Thanks! Though I think I have a good idea about what needs to be done. What would be most helpful would be if you could review changes like this one faster, but I understand that's a lot of work to ask of you. So I'm going to try to get a fork going so we can get the fork up to speed faster.

In particular I have some ideas on how n2 might implement content hashing for files

Yeah I'd like to add that feature as well.

Mar 05 '24 20:03 Colecf

Though what "tricky hack" are you talking about?

The custom "database" defined in db.rs. I think Ninja has some extra goop in it for the databasey corner cases, where like each row is suffixed with a parity check to catch partial writes to it.

Mar 06 '24 00:03 evmar