liballocs icon indicating copy to clipboard operation
liballocs copied to clipboard

make_precise should instead be allocator-level operations

Open stephenrkell opened this issue 3 years ago • 24 comments

Creating a 'precise' struct that ends with a flexible array member is a ballache. You have to create the array type, then create the struct type. The struct type's make_precise function should in theory do this. But if the flexible array member is many layers down, the number of created types and make_precise functions starts to add up. Currently we don't generate these functions, only the ones for the array type.

(Once no array type has a definite length (#34), structs that contain arrays will still have definite length, except for flexible array members, so that does not change things here.)

Given that we are rethinking arrays, we have a chance to rethink make_precise in general. We want it for:

  • stack frames, but we could instead rely on the stackframe allocator to mediate the uniqtypes (there is no reason why frames need to have an overall struct type)
  • unions? read/write is a mess at present... for read-validity we have not_simultaneous and may_be_invalid. But maybe we want read-validity as a separate concept orthogonal to uniqtype? i.e. at the allocator level. If so it could cover struct padding, inter-allocation dead space, genuinely read-trapping regions, sparseness that should not be disturbed (?), and so on.
  • 'temporally discriminated unions', e.g. inregs/outregs, hence why it takes an mcontext_t. But for these maybe we need to do the write-tracking thing, trap writes and keep last-written shadow state
  • explicitly explicitly discriminated unions (like uniqtype itself), variant records etc... the idea is that we can compiling a hypothetical DWARF expression, describing the discriminator semantics, into a make_precise function. It seems reasonable that a variant record or union would have such a piece of code (again logically/hypothetically speaking). So if we get rid of make_precise, where should it live? Perhaps as a special 'related' entry... composites already have N+1 relateds, where the extra one is the member names. We can decree that not_simultaneous composites have a function that returns read-validity... and write-validity? Or maybe a "do write to member N" memcpy-like function that respects invariants (N must be the index of a may_be_invalid member).
  • Can we also capture "tagvecs" a.k.a. _DYNAMIC or auxv-style arrays this way? We can view them as composites where a given member may be present, absent or even present >1 time. It would be better to view these as an allocator. This may require us to relax our view of uniqtypes as being only at the 'terminal' layer of allocation, i.e. here we can access the arena as an array of Elf64_Dyn entries or as an allocator managing a packed pool of discrete / singleton Elf64_Dyn entries.

We seem to want

struct composite_member_rw_funcs
{
    // we want this to return a bitmask in the common case
    // it is either ('address-boxing')
    //      - if vas.h tells us is not a valid user address,
    //           the caller should then mask it by ((1ull<<(nmembers))-1u)
    //           and the result is a bitmask conveying definedness of the
    //           first nmembers members
    //      - if vas.h tells us it is a valid user address,
    //           it points to a bit vector (le? be?) of nmemb entries,
    //               read-valid in whole words (i.e. rounded up to the word size)
    //           resource management? use TLS? yes I think TLS is best.
    //           the function that writes it can do  static __thread intptr_t mask;
    //           and return &mask (after writing the bits to it);
    intptr_t (*get_read_validity_mask)(struct uniqtype *, void *base, mcontext_t *ctxt);
    void     (*write_member)          (struct uniqtype *, void *base, mcontext_t *ctxt,
                                      unsigned memb_idx, const void *src);
};

stephenrkell avatar Oct 13 '21 17:10 stephenrkell

Maybe another way to think of the 'terminal layer' problem is that some allocators are "fixed-format". There may be many layers of fixed-format, but they always reside at the bottom of the tree.

Of course one can imagine exceptions to that, e.g. using a char buffer inside a struct as a malloc arena. I find that pretty wacky, so provisionally have no problem not being able to model it.

stephenrkell avatar Oct 26 '21 14:10 stephenrkell

Idle thought: is there, or should there be, a link between mincore() and our sparseness story on read/write validity?

stephenrkell avatar Oct 26 '21 16:10 stephenrkell

For mincore() and sparse areas, maybe we need to capture the ideas that (1) access might be 'valid' but nevertheless expensive, and (2) a cheaper yet indirect way to read the value, e.g. "it's all zeroes because we've never written to it", might be available.

stephenrkell avatar Jan 25 '22 16:01 stephenrkell

The issue of make_precise() came up again in the ELF allocator example, because some data types (NUL-term'd strings, x86 instructions, ...) have uncertain length but are self-delimiting. Can/should we make a a uniqtype describing these, and if so, how can it expose the length information?

stephenrkell avatar Jan 25 '22 16:01 stephenrkell

I guess the crux of the issue is that a flexible array need not be delimited by its containing allocation. We have been assuming (e.g. in libcrunch) that that's how it works. But it could instead be self-delimiting, e.g. something as simple as a null-terminated string. If the containing allocator cannot reasonably know the boundary, then the uniqtype itself has to know. We want to support both of these patterns. That means some kind of function for self-delimit logic seems inescapable.

stephenrkell avatar Jan 25 '22 16:01 stephenrkell

Can we make self-delimitingness an exclusive property of a 'packed sequence' type that is analogous to uniqtyes' __ARR_? Then the function could belong to the sequence type, not to the element type. Is that sane?

I am feeling the function actually needs to tell us two things: the distance to the end of this element, and the distance from there to the beginning of the next (default: zero). That's just in case there are alignment constraints or similar.

stephenrkell avatar Jan 25 '22 16:01 stephenrkell

If we do this, then our subobject-traversal code will gain a new case (alongside struct, union and array).

stephenrkell avatar Jan 25 '22 16:01 stephenrkell

Let's think about x86 some more. Ideally, a decode function would not only give us the length of the instruction, but also a precise type of the instruction (prefixes, opcodes, operands, SIB byte, immediates, etc). Is this sane? These are all bitfields. I guess we can expand every distinct format of an x86 instruction, with one struct for each. So we are not really getting away from the idea that . Still, putting it on the packed seq would make sense. It preserves the idea that only sequences (incl arrays) are variable-length (er, and structs that embed one of these as their last field).

stephenrkell avatar Jan 25 '22 16:01 stephenrkell

Another way to think of this: a packed seq has no element type. It only has a decode function, which enumerates the element types as it goes along.

stephenrkell avatar Jan 25 '22 17:01 stephenrkell

What happens if a get_type() query comes in landing in the middle of a packed seq? The decode operations are potentially expensive. Maybe that's not our fault? Maybe we (configurably) cache the results? In a metavector, say? Indeed we can think of a data segment's type info as a packed seq. Elaborating it at run time is probably not a sane move in the case of data segments, but the conceptual uniformity is appealing: a starts bitmap and metavector are constructed to cache a packed sequence. We can construct them statically (eagerly) or dynamically (lazily) as we wish.

stephenrkell avatar Jan 25 '22 18:01 stephenrkell

So it seems that packed sequences, together with some notions of read-validity and write-operation, are our replacement for make_precise(). Does this handle everything we need?

If we are to have __uniqtype__PACKED_SEQ_xxx, how do we name them? Really we want to identify them with their decode functions, which should be global symbols. How do we manage that namespace of global functions? It's almost like an /etc/services-style registry of names. But that means some kind of real-world authority would be necessary... not clear we want that.

stephenrkell avatar Jan 25 '22 18:01 stephenrkell

Another wart is that packed sequences can't be written to without messing stuff up, in the case where an element's size is changed. So now I'm thinking it's a kind of allocator. Are all packed sequences big enough to be bigallocs? This also keeps things uniform w.r.t. the use of metavectors and bitmaps. And our allocator-level knowledge of whether we've issued internal pointers, whether it's possible to resize/move stuff, etc.

stephenrkell avatar Jan 25 '22 18:01 stephenrkell

Radical thought: maybe we only need allocators, and this 'uniqtype' thing is one abstraction too many?

stephenrkell avatar Jan 26 '22 09:01 stephenrkell

There is now a packed sequence allocator.

Thinking about x86 again, we could imagine expanding x86 instruction format into a series of struct types. But then there'd be an overarching union type. So really, each element of a packed sequence is (morally) an instance of a discriminated (non-simultaneous) union.

stephenrkell avatar Mar 30 '22 17:03 stephenrkell