book icon indicating copy to clipboard operation
book copied to clipboard

Garbage Collection Basics

Open yorickpeterse opened this issue 7 years ago • 8 comments

The book should include a chapter (or series of multiple chapters) discussing the basics of garbage collection and how to tackle this using Rust. The end goal should be a set of resources to implement a naïve, non-moving mark-and-sweep garbage collector. While such a GC won't be very useful for most applications it's easy enough to understand/explain that it should serve as a good introduction.

yorickpeterse avatar Apr 25 '18 19:04 yorickpeterse

At the Basics level, this series should be limited to single-threaded mutator/gc stop-the-world collection.

Are there any reasons to not cover a basic implementation of copying collection such as Cheney's algorithm at this level? That might make a smoother transition to advanced gc #3.

Possible breakdown:

Mark and sweep collection

  • a free-list allocator
    • memory blocks/arenas
    • object headers
    • free-list alloc()
    • free-list free()
  • marking
  • sweeping

Copying collection

  • bump allocation
    • memory block alignment
    • from-space and to-space
  • copying collection
    • object forwarding
    • two-finger tracing and copying

pliniker avatar Apr 26 '18 18:04 pliniker

@pliniker

Are there any reasons to not cover a basic implementation of copying collection such as Cheney's algorithm at this level?

Since copying and mark-and-sweep are a bit different I think we could do both, then stick with one of the two for the rest of the book.

yorickpeterse avatar Apr 27 '18 11:04 yorickpeterse

Just had the realization that an alloc() function that directly function-calls a moving collector would need to update stack roots that could be in registers or on the Rust stack for at least the VM instruction being executed.

Since we don't have access to compiler generated stack maps, I think this means that a moving collector in Rust would rely on gc safe points and never call the collector inside alloc().

If so, Cheney's algorithm as classically described isn't possible because allocation is triggered on OOM, which you only know you've reached for certain inside alloc() and collection must be completed before the VM instruction that called alloc() continues.

Does this all sound correct @YorickPeterse ?

pliniker avatar Apr 29 '18 02:04 pliniker

@pliniker This depends a bit on what you consider to be "registers". For example, in Inko the "registers" are just indexes in a Rust Vec. This is done because the VM has to be able to suspend code at arbitrary points in time and resume code, meaning it can't rely on the Rust stack.

For the sake of simplicity we could do something similar where we use a dedicated data structure for the stack/register, instead of relying on the Rust call stack and registers.

yorickpeterse avatar Apr 30 '18 15:04 yorickpeterse

Another option is to slightly adjust Cheney's algorithm and don't immediately trigger GC in an allocation, instead waiting until reaching a safepoint.

yorickpeterse avatar Apr 30 '18 15:04 yorickpeterse

@YorickPeterse Having re-read the Immix paper, I think I see better why you favor it :slightly_smiling_face: and I have to agree.

I'm now thinking that the Basics chapter(s) might best be designed to build toward Immix: block-size-aligned blocks (#9), basic bump allocation (#4), mark & sweep.

This leaves major issues unaddressed and doesn't result in a practically usable GC yet - line marking, allocating into partially reclaimed blocks, defragmentation, parallelizing etc need to be covered in the Advanced chapter(s).

This feels a lot more targeted and focused to me, now. Perhaps it's what you were originally suggesting?

pliniker avatar May 12 '18 01:05 pliniker

@pliniker That sounds like a good idea, even if it's just because it saves us from having to cover two very different algorithms.

yorickpeterse avatar May 12 '18 18:05 yorickpeterse

Depends on:

  • #24

pliniker avatar Jul 07 '20 20:07 pliniker