getty icon indicating copy to clipboard operation
getty copied to clipboard

Avoiding unnecessary memory allocations in visitors

Open ibokuri opened this issue 1 year ago • 8 comments

Problem

There is no way for a visitor to know if a pointer value it has received is:

  1. Allocated.
  2. Safe to use as part of the return value.

Thus, visitors are forced to play it safe and always make copies, which can result in unnecessary allocations.

Proposal

To fix this, the following things need to be added to Getty:

  1. A way for visitors to know if the pointer value they received from a deserializer is safe to use as part of their return value.

    • There are two ways for a visitor to receive a pointer value from a deserializer: the value parameter in visitString and the return value of access methods (e.g., nextKeySeed, nextElementSeed).
  2. A way for deserializers to know if visitString is using the slice as part of the final value, and how much of that slice is being used.

Part One: The Visitor

How can visitors know if the pointer value they received from a deserializer is safe to use as part of their return value?


To solve this, we can do the following:

  1. Define the following type:

    ⚠️ Edit: See this comment for new Lifetime design. ⚠️

    pub const Lifetime = enum {
        Stack,
        Heap,
        Owned,
    }
    
    • The type will indicate the lifetime and ownership properties of pointer values passed to visitors:

      1. Stack: The value lives on the stack and its lifetime is shorter than the deserialization process.
        • The value must be copied by the visitor.
      2. Heap: The value lives on the heap and its lifetime is longer than the deserialization process and is independent of any entity.
        • The value can either be copied or returned directly.
      3. Owned: The value lives on the stack or heap and its lifetime is managed by some entity.
        • The value can either be copied by the visitor or returned directly if the visitor understands and deems the value's lifetime as safe.
        • Since Getty's default visitors won't have enough info to determine whether an Owned value's lifetime is safe, they must always copy such values.
    • When should visitors free the pointer values they receive?

      1. Stack or Owned values should never be freed by the visitor.
        • Stack values will be automatically cleaned up by the compiler, obviously.
        • Owned values will be cleaned up eventually after deserialization is finished by the entity that owns them.
      2. Heap values passed to visitString should never be freed by the visitor. This is b/c the value is a Getty value and so the deserializer is responsible for freeing it.
      3. Heap values returned from an access method should be freed by the visitor upon error or if it's not part of the final value. The deserializer will never see these values again, so it's the visitor's responsibility to free them.
  2. Add a lifetime parameter to visitString that specifies the Lifetime of input.

  3. Remove the is*Allocated methods from access interfaces. With Lifetime, we don't need them anymore.

  4. Modify the successful return type of access methods to be:

    struct {
        data: @TypeOf(seed).Value, // This may be optional, depending on the access method.
        lifetime: Lifetime,
    }
    

With these changes, visitors can do the following:

// in visitString...
switch (lifetime) {
  .Stack => // Make a copy of `input`
  .Owned, .Heap => // Make a copy of `input` or return it directly
}

// in visitMap...
while (try map.nextKey(ally, Key)) |key| {
    switch (key.lifetime) {
      .Stack => // Make a copy of `key.data`
      .Owned => // Make a copy of `key.data` or return it directly
      .Heap  => // Make a copy of `key.data` or return it directly & free it as necessary
    }
}

Part Two: The Deserializer

How does a deserializer know if visitString is using the slice as part of the final value, and how much of that slice is being used?


Before diving in, there are a few things to keep in mind:

  1. Access methods are irrelevant for this part. Deserializers will never see them again so no need to worry about them.
  2. This part only apply to Heap values.
    • Stack values are obviously managed automatically by the compiler.
    • Owned values are managed outside the deserialization process, so functions like deserializeString don't need to worry about them.
  3. The return value of visitString might not be a string at all, so we shouldn't rely solely on visitString's return value. Besides, even if it is a string it'll be very tedious using it in the deserializer to figure out what to free and what to keep.

In any case, to solve this, we can do the following:

⚠️ Edit: See this comment for new solution. ⚠️

  1. Change the return type of visitString to the following:

    const Return = struct {
        value: Value,
        indices: ?struct { start: usize, end: usize } = null,
    };
    
    fn visitString(
        self: Self,
        ally: ?std.mem.Allocator,
        comptime Deserializer: type,
        input: anytype,
    ) Deserializer.Error!Return
    
    • If indices is null, then that means visitString did not use input as part of its return value. In which case, the deserializer should free value afterwards.
    • If indices is not null, then that means visitString did use input as part of its return value. start and end specifies the starting and ending indices in input at which visitString's return value begins and ends.
  2. With this new indices field, the deserializer now knows 1) if the visitor is using input directly in its return value, and 2) how much of input is being used.

    • If the entirety of input is being used, then the deserializer should not free input after calling visitString.
    • If only a partial slice of input is being used, then the deserializer can use start and end to determine the remaining parts of input that should be freed.

ibokuri avatar Sep 10 '23 00:09 ibokuri