liballocs icon indicating copy to clipboard operation
liballocs copied to clipboard

Uniqtype aliases don't preserve uniqueness in all cases

Open stephenrkell opened this issue 5 years ago • 3 comments

The current handling of typedefs, and other cases of aliased uniqtypes (base types, bitfield types) is not quite right.

Consider a DSO in which there is a type foo_t, being a typedef of int, and another DSO in which there is bar_t, also a typedef of int. In the eventual run-time metadata, all three should be aliases of a single unique int type.

But instead, as long as foo_t and bar_t don't appear in the other DSOs, one of the two int definitions will 'lose' and be overridden, while its alias (foo_t or bar_t) will not be overridden, so will remain, and will be non-equal to the 'winning' int type.

This is basically a result of aliases being bound too early, before the global symbol overriding is resolved. We can blame the inflexible semantics of ELF dynamic linking.

One way to fix it up might be to use our dlbind library, if we can preserve its place near the head of the link order. If so, we can intervene whenever we load new metadata, and insert ABS symbols pointing at the 'real' target of any aliased uniqtype. For example, when loading the DSO defining 'foo_t' and 'int' uniqtypes, supposing we already have an aliased 'bar_t' and 'int', we create a '__uniqtype__foo_t' ABS symbol whose value is the 'bar_t'/'int' uniqtype address. This is outwith both the meta-DSO being loaded and the dlbind DSO; and we rely on our dlbind overriding that meta-DSO's definitions.

stephenrkell avatar May 17 '19 08:05 stephenrkell

I think any solution solution has to arrange that the dlbind library is the first DSO in link order which is allowed to contain uniqtype definitions. This is already almost true, since we never put uniqtypes in the preload lib. However, we do put them in the executable sometimes (with libcrunch). That needs to be avoided or worked around somehow.

Then, on loading a meta-DSO, we can do a scan to see if we're introducing a new alias for an already-defined symbol. Then... what?

  • We could try to promote the whole uniqtype (including all its aliases) into the dlbind object if it is not there already. But that's no good -- pointers to the previously live uniqtype will already be in circulation, so we can't recall them.
  • We could do the ABS thing: create the alias in the dlbind object. Then if/when the alias-defining meta-DSO is unloaded, remove the alias. (This means keeping a refcount per dlbind-installed alias, somewhere, in case another meta-DSO defining the alias was also loaded in the meantime.) That also doesn't work, for the converse reason: what if we then remove the meta-DSO to which the ABS alias refers? Clients who want to refer to the alias will have dangling pointers into the address space of the now-removed meta-DSO.

We could do one of the following.

  • Dispense with meta-DSOs entirely, instead manually defining all uniqtypes in the dlbind DSO. Then we have to do bookkeeping on a per-uniqtype basis.
  • Use our hypothetical future "allocation relocation" facility to move still-used uniqtypes out of the meta-DSO that is being removed.
  • Decree that we never remove meta-DSOs.

The Right Solution is probably a mixture of the last two: we support removal of meta-DSOs, but any uniqtypes that are live thanks to ABS aliases are "liberated" i.e. moved, by allocation relocation, either into the dlbind DSO or (perhaps better) into a GC'd area from where they can be collected later.

Note also the overlap with use-after-free: if we're dlclosing a meta-DSO into which we have other (non-ABS-induced) referenced, that implies a dangling-pointer bug in the user's code.

The first option is basically doing GC for all uniqtypes, ignoring the basic discipline that user code ought to respect: if you refer to a uniqtype, don't keep the pointer after the corresponding meta-DSO is closed. But hmm... given that aliases exist, is there always a corresponding meta-DSO? E.g. I open lib1 and lib2, then close lib1; but lib2 shares some uniqtypes in lib1, and my pointers to these are now dangling. Or rather, they should be updated to point to the copies of the uniqtypes (previously overridden, but now not overridden) defined in lib2's meta-DSO. So either way, we need to "update the pointers", here so that the newly-exposed copies are now the live ones. This seems like a general use-after-free hazard inherent to symbol overriding. Does the refcounting of DSOs already prevent this? If i dlclose lib1, really it shouldn't go away until lib2 has also gone away. (Edit: after an experiment, at least glibc's ld.so does the right thing -- lib1's refcount is bumped on account of its overriding stuff in lib2. So we don't have to worry about that.)

I think I should assume that only my ABS hack is creating extra cases to worry about... just worry about liberating the ABS-induced references. One quick fix would be to bump the refcount of any ABS-referenced meta-DSO (once per ABS reference). This prevents it from being closed. If we later decide that those ABS references can go away (how would we decide this? hmm) we can decrement it the same number of times.

stephenrkell avatar Jun 10 '19 17:06 stephenrkell

To get around the "uniqtypes already in the executable" case, we can probably do a hacky pass at startup that re-sites them in the dlbind object, using relocation records (recovered or from ld -q). Then we need to clobber the dynstr (and maybe the hash tables?) so that dlsym won't find the old copies. That is really nasty.

stephenrkell avatar Jun 10 '19 17:06 stephenrkell

Actually I'm not sure we really have to resite them from the executable. The executable cannot be unloaded with dlclose(), even if this is arguably an anomaly that should be corrected! We can just define the alias in libdlbind's symtab, but pointing into the executable.

So the summary seems to be: whenever we do a dlopen that wants to introduce a fresh alias for a uniqtype that's already loaded, instead add it in the dlbind object's dynsym (as ABS) and bump the referenced shared object's refcount.

We also want a mechanism for scheduling the corresponding decrement of the refcount, for if/when the dlopened object is closed... since we already hook dlclose, a per-object list of handles is not too difficult to arrange. (Also I'm fairly sure our handling of dlclose is currently broken in other ways.)

stephenrkell avatar Jun 20 '19 15:06 stephenrkell