liballocs
liballocs copied to clipboard
make_precise should instead be allocator-level operations
Creating a 'precise' struct that ends with a flexible array member is a ballache. You have to create the array type, then create the struct type. The struct type's make_precise
function should in theory do this. But if the flexible array member is many layers down, the number of created types and make_precise
functions starts to add up. Currently we don't generate these functions, only the ones for the array type.
(Once no array type has a definite length (#34), structs that contain arrays will still have definite length, except for flexible array members, so that does not change things here.)
Given that we are rethinking arrays, we have a chance to rethink make_precise
in general. We want it for:
- stack frames, but we could instead rely on the stackframe allocator to mediate the uniqtypes (there is no reason why frames need to have an overall struct type)
- unions? read/write is a mess at present... for read-validity we have
not_simultaneous
andmay_be_invalid
. But maybe we want read-validity as a separate concept orthogonal to uniqtype? i.e. at the allocator level. If so it could cover struct padding, inter-allocation dead space, genuinely read-trapping regions, sparseness that should not be disturbed (?), and so on. - 'temporally discriminated unions', e.g. inregs/outregs, hence why it takes an
mcontext_t
. But for these maybe we need to do the write-tracking thing, trap writes and keep last-written shadow state - explicitly explicitly discriminated unions (like uniqtype itself), variant records etc... the idea is that we can compiling a hypothetical DWARF expression, describing the discriminator semantics, into a
make_precise
function. It seems reasonable that a variant record or union would have such a piece of code (again logically/hypothetically speaking). So if we get rid ofmake_precise
, where should it live? Perhaps as a special 'related' entry... composites already have N+1 relateds, where the extra one is the member names. We can decree that not_simultaneous composites have a function that returns read-validity... and write-validity? Or maybe a "do write to member N" memcpy-like function that respects invariants (N must be the index of a may_be_invalid member). - Can we also capture "tagvecs" a.k.a.
_DYNAMIC
or auxv-style arrays this way? We can view them as composites where a given member may be present, absent or even present >1 time. It would be better to view these as an allocator. This may require us to relax our view of uniqtypes as being only at the 'terminal' layer of allocation, i.e. here we can access the arena as an array ofElf64_Dyn
entries or as an allocator managing a packed pool of discrete / singletonElf64_Dyn
entries.
We seem to want
struct composite_member_rw_funcs
{
// we want this to return a bitmask in the common case
// it is either ('address-boxing')
// - if vas.h tells us is not a valid user address,
// the caller should then mask it by ((1ull<<(nmembers))-1u)
// and the result is a bitmask conveying definedness of the
// first nmembers members
// - if vas.h tells us it is a valid user address,
// it points to a bit vector (le? be?) of nmemb entries,
// read-valid in whole words (i.e. rounded up to the word size)
// resource management? use TLS? yes I think TLS is best.
// the function that writes it can do static __thread intptr_t mask;
// and return &mask (after writing the bits to it);
intptr_t (*get_read_validity_mask)(struct uniqtype *, void *base, mcontext_t *ctxt);
void (*write_member) (struct uniqtype *, void *base, mcontext_t *ctxt,
unsigned memb_idx, const void *src);
};
Maybe another way to think of the 'terminal layer' problem is that some allocators are "fixed-format". There may be many layers of fixed-format, but they always reside at the bottom of the tree.
Of course one can imagine exceptions to that, e.g. using a char buffer inside a struct as a malloc arena. I find that pretty wacky, so provisionally have no problem not being able to model it.
Idle thought: is there, or should there be, a link between mincore()
and our sparseness story on read/write validity?
For mincore()
and sparse areas, maybe we need to capture the ideas that (1) access might be 'valid' but nevertheless expensive, and (2) a cheaper yet indirect way to read the value, e.g. "it's all zeroes because we've never written to it", might be available.
The issue of make_precise()
came up again in the ELF allocator example, because some data types (NUL-term'd strings, x86 instructions, ...) have uncertain length but are self-delimiting. Can/should we make a a uniqtype describing these, and if so, how can it expose the length information?
I guess the crux of the issue is that a flexible array need not be delimited by its containing allocation. We have been assuming (e.g. in libcrunch) that that's how it works. But it could instead be self-delimiting, e.g. something as simple as a null-terminated string. If the containing allocator cannot reasonably know the boundary, then the uniqtype itself has to know. We want to support both of these patterns. That means some kind of function for self-delimit logic seems inescapable.
Can we make self-delimitingness an exclusive property of a 'packed sequence' type that is analogous to uniqtyes' __ARR_
? Then the function could belong to the sequence type, not to the element type. Is that sane?
I am feeling the function actually needs to tell us two things: the distance to the end of this element, and the distance from there to the beginning of the next (default: zero). That's just in case there are alignment constraints or similar.
If we do this, then our subobject-traversal code will gain a new case (alongside struct, union and array).
Let's think about x86 some more. Ideally, a decode function would not only give us the length of the instruction, but also a precise type of the instruction (prefixes, opcodes, operands, SIB byte, immediates, etc). Is this sane? These are all bitfields. I guess we can expand every distinct format of an x86 instruction, with one struct for each. So we are not really getting away from the idea that . Still, putting it on the packed seq would make sense. It preserves the idea that only sequences (incl arrays) are variable-length (er, and structs that embed one of these as their last field).
Another way to think of this: a packed seq has no element type. It only has a decode function, which enumerates the element types as it goes along.
What happens if a get_type()
query comes in landing in the middle of a packed seq? The decode operations are potentially expensive. Maybe that's not our fault? Maybe we (configurably) cache the results? In a metavector, say? Indeed we can think of a data segment's type info as a packed seq. Elaborating it at run time is probably not a sane move in the case of data segments, but the conceptual uniformity is appealing: a starts bitmap and metavector are constructed to cache a packed sequence. We can construct them statically (eagerly) or dynamically (lazily) as we wish.
So it seems that packed sequences, together with some notions of read-validity and write-operation, are our replacement for make_precise()
. Does this handle everything we need?
If we are to have __uniqtype__PACKED_SEQ_xxx
, how do we name them? Really we want to identify them with their decode functions, which should be global symbols. How do we manage that namespace of global functions? It's almost like an /etc/services
-style registry of names. But that means some kind of real-world authority would be necessary... not clear we want that.
Another wart is that packed sequences can't be written to without messing stuff up, in the case where an element's size is changed. So now I'm thinking it's a kind of allocator. Are all packed sequences big enough to be bigallocs? This also keeps things uniform w.r.t. the use of metavectors and bitmaps. And our allocator-level knowledge of whether we've issued internal pointers, whether it's possible to resize/move stuff, etc.
Radical thought: maybe we only need allocators, and this 'uniqtype' thing is one abstraction too many?
There is now a packed sequence allocator.
Thinking about x86 again, we could imagine expanding x86 instruction format into a series of struct types. But then there'd be an overarching union type. So really, each element of a packed sequence is (morally) an instance of a discriminated (non-simultaneous) union.