rune icon indicating copy to clipboard operation
rune copied to clipboard

Allocation layout

Open CeleritasCelery opened this issue 1 year ago • 8 comments

Currently all allocation are part of a big enum. This is very space inefficient, since each enum instance has to be the size of the largest variant. We need to create an object header that contains the size of the object in it, which will be much more space efficient.

Solution 1 - Embedded the header directly in the objects

Since all the object types are gc-only, we could keep the header directly in the object struct. Something like this:

struct LispString {
    header: ObjectHeader,
    // other fields
};

However in order to make this work, the struct would need to be #[repr(c)]. I don't like that because it means that the compiler can't optimize the layout, and we would need to do it by hand. However it would probably be the easiest solution.

Solution 2 - Store the header separately

Instead of storing the header in the object struct, we could store it only in the heap. The pointer to an object would have provenance over both the Header and the object.

struct ObjectHeader {
    type: Type,
    size: usize,
}

One problem with this approach is that if you take a reference to the Object itself (references only have provenance to the thing they reference), you loose the ability to access the header. Solution 1 is probably simpler.

CeleritasCelery avatar Jan 14 '23 19:01 CeleritasCelery

What if for non-simple types, we store a raw pointer to ObjectHeader (heap) which we should be able to get an offset (at compile time)

Alan-Chen99 avatar May 02 '23 00:05 Alan-Chen99

That is actually a great idea. You have to be careful with this pattern though that you don't trigger UB.

For example if this is my code:

struct ObjectHeader<T> {
    header: Header,
    value: T,
}

I can go from *const ObjectHeader<T> to &T, but I can't go from &T to *const ObjectHeader<T> (due to the provenance issues discussed in that issue). Converting a pointer to a reference shrinks it's provenance to only the scope of the reference.

CeleritasCelery avatar May 02 '23 04:05 CeleritasCelery

I would think that you do all the conversion as raw pointers. This shouldn't violate provenance since the province you originally got the object is the whole object (and miri will track it as that I think)

Alan-Chen99 avatar May 04 '23 03:05 Alan-Chen99

Also, currently cons are 24 bytes. Would we benefit from shrinking to 16?

Alan-Chen99 avatar May 04 '23 03:05 Alan-Chen99

currently cons are 24 bytes. Would we benefit from shrinking to 16?

We would benefit greatly. Cons is one of the most common objects (my current running instance has already allocated over 100 million of them). Do you have any ideas on how to do that?

CeleritasCelery avatar May 04 '23 04:05 CeleritasCelery

I have two approach in mind:

  1. allow the rawobj type to store an extra bit you get from .extra(), whose meaning is up to the containing object, thus fitting the readonly and tracing bit
  2. encode readonly and tracing in pointers. we encode if a pointer has be traced and if a pointee is readonly

2 is more cleaner but consumes an extra tag bit.

Alan-Chen99 avatar May 04 '23 19:05 Alan-Chen99

Of course a different strategy need to be used for 32 bit systems. Have you looked at how emacs / cl does this?

Alan-Chen99 avatar May 04 '23 19:05 Alan-Chen99

Have you looked at how emacs / cl does this?

They use a mark bitmap.

https://github.com/emacs-mirror/emacs/blob/eb3a90619fed86298c96951af527a8483bdd1a3c/src/alloc.c#L2811-L2817

CeleritasCelery avatar May 05 '23 05:05 CeleritasCelery