miri Add GC heuristic

Miri's GC runs every 10000 basic block, by default. It turns out that sometimes, it is better if the GC runs more often. This is especially true for some Tree Borrows programs, since in Tree Borrow, memory accesses take time linear in the size of a certain tree, and this tree can grow quite large, but is usually compacted significantly at every GC run. For some programs, it can be advantageous to run the GC every 200 blocks, or even more often.

Unfortunately, such a GC frequency would slow down other programs. To achieve a balance, this PR proposes a "heuristics" that allows running the GC more often based on how large the Tree Borrows tree grows. It is quite likely that other parts of Miri want to plug into this mechanisms as well, eventually. However, such concrete other cases are further work, as long as the system is designed to be general enough. This PR only contributes integration with Tree Borrows. It is also possible that the actual heuristics need more tweaks, but for now, the performance improvements look promising.

It is further likely that the overall design needs some change, since it currently relies on Cells. Also, I was not too sure where to put the new enums. So this PR is WIP for now, although it is functionally complete.

Oct 27 '24 15:10 JoJoDeveloping

Hm, I'm wary of this heuristic because I tried something like it to solve the inverse problem and didn't get good results. Do you have any demonstrations of workloads where this implementation makes a huge difference?

Oct 27 '24 16:10 saethlin

My main immediate concern is that this can lead to significant slowdown if there are large trees that do not get compacted. So I feel like if we have such a heuristic, it should also take into account "did the big tree actually shrink on the last GC run" or something like that, and increase the GC period if the answer is "no".

Oct 28 '24 06:10 RalfJung

What does this PR do on MIRIFLAGS=-Zmiri-tree-borrows ./miri bench? Please show numbers from before and after this change.

(Unfortunately we don't have a way to automatically compare two bench reports and print "things got X% slower/faster"... see https://github.com/rust-lang/miri/issues/3999 )

Oct 28 '24 06:10 RalfJung

My main immediate concern is that this can lead to significant slowdown if there are large trees that do not get compacted. So I feel like if we have such a heuristic, it should also take into account "did the big tree actually shrink on the last GC run" or something like that, and increase the GC period if the answer is "no".

The way this works is by initially having some exponential back off with how often the GC runs (whenever the size doubles since the last GC run). Eventually that goes to linear (whenever a tree grew by X nodes since the last GC run). The reason is that I recon that most programs do not have many different provenances around (by using a large array storing pointers only differing in their provenance, for example). For the tree to not get compacted by the GC requires that programs execute lots of retags and then keep all these pointers alive. On such workloads, the GC already does nothing. Even even such programs would have to create and store a new reference every 7 basic blocks (with the current settings) for it to be slowed down by this patch. But note that for programs with a lot of memory use, the GC already needs to be tuned down (probably) for things to work.

In the end it is all trade-offs. This one seems to be worth it. I am not sure the current performance testsuite consists of programs where this shows (one way or another) but I can add some where it does.

Oct 28 '24 15:10 JoJoDeveloping

After some musing with @saethlin, we figured the better answer than having a GC might be to use (some form of) ref-counting, at least in theory. Unfortunately that would require a large refactoring (and past attempts of doing it failed? ), and so for now we seem to be stuck with the GC, for better or worse.

Oct 28 '24 15:10 JoJoDeveloping

The way this works is by initially having some exponential back off with how often the GC runs (whenever the size doubles since the last GC run).

Right, that makes sense, it just wasn't clear from the PR description. :)

Eventually that goes to linear (whenever a tree grew by X nodes since the last GC run).

That's a bit more surprising. What is the rationale for that? Did you compare the PR with and without this part?

Many programs not having a ton of provenances would already be captured by the first part, no? (Also I am not sure that's true, each live reference has its own provenance and there can easily be a ton of those if someone has a large array of things that involve references.)

I am not sure the current performance testsuite consists of programs where this shows (one way or another) but I can add some where it does.

Yes that would be good -- preferably one benchmark demonstrating a clear benefit. (We don't want to grow our number of benchmarks too much, either...)

Oct 28 '24 17:10 RalfJung

We don't want to grow our number of benchmarks too much, either...

Do we not? Changes like the one in this PR would be much easier to find/develop/test if the suite of benchmarks was more diverse and larger in general. It its current form, it is useful only to ensure there are no really major performance regressions. But that's a separate issue.

Oct 29 '24 11:10 JoJoDeveloping

More benchmarks means they take forever to run, and at least until we figure out https://github.com/rust-lang/miri/issues/3999 also forever to analyze.

Oct 29 '24 11:10 RalfJung

:umbrella: The latest upstream changes (presumably #4009) made this pull request unmergeable. Please resolve the merge conflicts.

Nov 02 '24 21:11 bors

I'm a bit unsure what is the best way to go forward here. I found that there are some more cases where changing the numbers here so that the GC runs less often.

The more I think about it, the more I think you want some "more clever" tuning thing here that looks at Miri's current execution speed, compared to the one after the most recent GC run, and tries to gauge when it's time for the next one. Perhaps @saethlin also has something to weigh in on that.

Dec 11 '24 11:12 JoJoDeveloping

I found that there are some more cases where changing the numbers here so that the GC runs less often.

Sorry I can't parse this, what do you mean?

Dec 11 '24 12:12 RalfJung

Huh, I seem to have swallowed a sentence there. What I mean is that I found some programs where this change is bad because it makes the GC run too often. Instead, these programs achieve optimal performance when the "magic numbers" in this PR are moved up a bit, so that the trees need to grow more before the GC is triggered. So finding what is a good number there is nontrivial/impossible, which makes me more unhappy with the current approach.

Dec 11 '24 12:12 JoJoDeveloping

Well that's the usual issue with heuristics...

Dec 11 '24 12:12 RalfJung

I'm not a GC expert, so if you haven't already, it might be good to at least draw inspiration from what techniques the really good mark-and-sweep GCs use to get around this problem. I bet the severity of the problem is significantly reduced by having a generational GC (which we don't have) so it's also possible other GCs just don't suffer this as much.

I think the only way out of this is to make the GC tune the frequency based on some kind of metric for how well its time is being spent. Looking at things like how much of the tags were garbage on the last run, or what fraction of time is being spent in the GC itself. The lack of self-tuning is the crux of https://github.com/rust-lang/miri/issues/3209.

Dec 11 '24 15:12 saethlin

:umbrella: The latest upstream changes (possibly 887b4984e2ec4690afb6aa2c3f31408b570033ca) made this pull request unmergeable. Please resolve the merge conflicts.

Dec 29 '24 09:12 rustbot

:umbrella: The latest upstream changes (possibly #4226) made this pull request unmergeable. Please resolve the merge conflicts.

Mar 14 '25 05:03 rustbot