getty
getty copied to clipboard
Avoiding unnecessary memory allocations in visitors
Problem
There is no way for a visitor to know if a pointer value it has received is:
- Allocated.
- Safe to use as part of the return value.
Thus, visitors are forced to play it safe and always make copies, which can result in unnecessary allocations.
Proposal
To fix this, the following things need to be added to Getty:
-
A way for visitors to know if the pointer value they received from a deserializer is safe to use as part of their return value.
- There are two ways for a visitor to receive a pointer value from a deserializer: the
value
parameter invisitString
and the return value of access methods (e.g.,nextKeySeed
,nextElementSeed
).
- There are two ways for a visitor to receive a pointer value from a deserializer: the
-
A way for deserializers to know if
visitString
is using the slice as part of the final value, and how much of that slice is being used.
Part One: The Visitor
How can visitors know if the pointer value they received from a deserializer is safe to use as part of their return value?
To solve this, we can do the following:
-
Define the following type:
⚠️ Edit: See this comment for new
Lifetime
design. ⚠️pub const Lifetime = enum { Stack, Heap, Owned, }
-
The type will indicate the lifetime and ownership properties of pointer values passed to visitors:
-
Stack
: The value lives on the stack and its lifetime is shorter than the deserialization process.- The value must be copied by the visitor.
-
Heap
: The value lives on the heap and its lifetime is longer than the deserialization process and is independent of any entity.- The value can either be copied or returned directly.
-
Owned
: The value lives on the stack or heap and its lifetime is managed by some entity.- The value can either be copied by the visitor or returned directly if the visitor understands and deems the value's lifetime as safe.
- Since Getty's default visitors won't have enough info to determine whether an
Owned
value's lifetime is safe, they must always copy such values.
-
-
When should visitors free the pointer values they receive?
-
Stack
orOwned
values should never be freed by the visitor.-
Stack
values will be automatically cleaned up by the compiler, obviously. -
Owned
values will be cleaned up eventually after deserialization is finished by the entity that owns them.
-
-
Heap
values passed tovisitString
should never be freed by the visitor. This is b/c the value is a Getty value and so the deserializer is responsible for freeing it. -
Heap
values returned from an access method should be freed by the visitor upon error or if it's not part of the final value. The deserializer will never see these values again, so it's the visitor's responsibility to free them.
-
-
-
Add a
lifetime
parameter tovisitString
that specifies theLifetime
ofinput
. -
Remove the
is*Allocated
methods from access interfaces. WithLifetime
, we don't need them anymore. -
Modify the successful return type of access methods to be:
struct { data: @TypeOf(seed).Value, // This may be optional, depending on the access method. lifetime: Lifetime, }
With these changes, visitors can do the following:
// in visitString...
switch (lifetime) {
.Stack => // Make a copy of `input`
.Owned, .Heap => // Make a copy of `input` or return it directly
}
// in visitMap...
while (try map.nextKey(ally, Key)) |key| {
switch (key.lifetime) {
.Stack => // Make a copy of `key.data`
.Owned => // Make a copy of `key.data` or return it directly
.Heap => // Make a copy of `key.data` or return it directly & free it as necessary
}
}
Part Two: The Deserializer
How does a deserializer know if visitString
is using the slice as part of the final value, and how much of that slice is being used?
Before diving in, there are a few things to keep in mind:
- Access methods are irrelevant for this part. Deserializers will never see them again so no need to worry about them.
- This part only apply to
Heap
values.-
Stack
values are obviously managed automatically by the compiler. -
Owned
values are managed outside the deserialization process, so functions likedeserializeString
don't need to worry about them.
-
- The return value of
visitString
might not be a string at all, so we shouldn't rely solely onvisitString
's return value. Besides, even if it is a string it'll be very tedious using it in the deserializer to figure out what to free and what to keep.
In any case, to solve this, we can do the following:
⚠️ Edit: See this comment for new solution. ⚠️
-
Change the return type of
visitString
to the following:const Return = struct { value: Value, indices: ?struct { start: usize, end: usize } = null, }; fn visitString( self: Self, ally: ?std.mem.Allocator, comptime Deserializer: type, input: anytype, ) Deserializer.Error!Return
- If
indices
isnull
, then that meansvisitString
did not useinput
as part of its return value. In which case, the deserializer should freevalue
afterwards. - If
indices
is notnull
, then that meansvisitString
did useinput
as part of its return value.start
andend
specifies the starting and ending indices ininput
at whichvisitString
's return value begins and ends.
- If
-
With this new
indices
field, the deserializer now knows 1) if the visitor is usinginput
directly in its return value, and 2) how much ofinput
is being used.- If the entirety of
input
is being used, then the deserializer should not freeinput
after callingvisitString
. - If only a partial slice of
input
is being used, then the deserializer can usestart
andend
to determine the remaining parts ofinput
that should be freed.
- If the entirety of