book
book copied to clipboard
Garbage Collection Basics
The book should include a chapter (or series of multiple chapters) discussing the basics of garbage collection and how to tackle this using Rust. The end goal should be a set of resources to implement a naïve, non-moving mark-and-sweep garbage collector. While such a GC won't be very useful for most applications it's easy enough to understand/explain that it should serve as a good introduction.
At the Basics level, this series should be limited to single-threaded mutator/gc stop-the-world collection.
Are there any reasons to not cover a basic implementation of copying collection such as Cheney's algorithm at this level? That might make a smoother transition to advanced gc #3.
Possible breakdown:
Mark and sweep collection
- a free-list allocator
- memory blocks/arenas
- object headers
- free-list alloc()
- free-list free()
- marking
- sweeping
Copying collection
- bump allocation
- memory block alignment
- from-space and to-space
- copying collection
- object forwarding
- two-finger tracing and copying
@pliniker
Are there any reasons to not cover a basic implementation of copying collection such as Cheney's algorithm at this level?
Since copying and mark-and-sweep are a bit different I think we could do both, then stick with one of the two for the rest of the book.
Just had the realization that an alloc() function that directly function-calls a moving collector would need to update stack roots that could be in registers or on the Rust stack for at least the VM instruction being executed.
Since we don't have access to compiler generated stack maps, I think this means that a moving collector in Rust would rely on gc safe points and never call the collector inside alloc().
If so, Cheney's algorithm as classically described isn't possible because allocation is triggered on OOM, which you only know you've reached for certain inside alloc() and collection must be completed before the VM instruction that called alloc() continues.
Does this all sound correct @YorickPeterse ?
@pliniker This depends a bit on what you consider to be "registers". For example, in Inko the "registers" are just indexes in a Rust Vec. This is done because the VM has to be able to suspend code at arbitrary points in time and resume code, meaning it can't rely on the Rust stack.
For the sake of simplicity we could do something similar where we use a dedicated data structure for the stack/register, instead of relying on the Rust call stack and registers.
Another option is to slightly adjust Cheney's algorithm and don't immediately trigger GC in an allocation, instead waiting until reaching a safepoint.
@YorickPeterse Having re-read the Immix paper, I think I see better why you favor it :slightly_smiling_face: and I have to agree.
I'm now thinking that the Basics chapter(s) might best be designed to build toward Immix: block-size-aligned blocks (#9), basic bump allocation (#4), mark & sweep.
This leaves major issues unaddressed and doesn't result in a practically usable GC yet - line marking, allocating into partially reclaimed blocks, defragmentation, parallelizing etc need to be covered in the Advanced chapter(s).
This feels a lot more targeted and focused to me, now. Perhaps it's what you were originally suggesting?
@pliniker That sounds like a good idea, even if it's just because it saves us from having to cover two very different algorithms.
Depends on:
- #24