libdlbind
libdlbind copied to clipboard
Generalisation?
The ability to map a loaded DSO MAP_SHARED
seems powerful as a primitive. It's worth thinking about this more generally. It brings some image-like qualities to Unix systems. Does it help with live coding? Does it help with caching compilation results in a way similar to the R people (Jan Vitek et al) have done? Can it help with security updates somehow? In a single-user system where permissions are not an issue, what are its benefits and drawbacks if used at wide scale? Can it allow linkage to be used to form interprocess bindings?
Writing a paper about this system increasingly seems feasible, although some experiments comparing it to the gdb JIT protocol (and others if there are any?) seem essential.
One way to think of this might be that the canonical files on disk are "prototypes", which each process using them can specialise much as in prototype-based O-O languages. These specialised versions can then be re-exported to the world for direct use, or themselves used as prototypes.
It also relates to my longstanding "everything is a dlopen
" concept... essentially it expands dlopen
to something roughly like binding to a collection of segments defined by another process. It feels very Multics-y also.
Given an 'immutable' system library like libc.so
, can we delay the moment of divergence, in a vaguely copy-on-write style? This would mean that only when we attempt to modify the library contents (or ask for a handle on which to do so) is the link to the system libc broken. This is probably possible as a sort of "upgrade" on an existing link map entry, to make it into a dlbind-able one. We could just rewrite the link map perhaps. At this stage we are getting into the territory of a custom dynamic linker, since it seems easy to break the internal invariants (e.g. about who frees the strings in the link_map... though probably liballocs can track that?).
Roughly what we want is if we mmap over the MAP_PRIVATE
segments with the same parts of a copy of the file. However, that's not quite right in the case of (1) data segments which may have been relocated+updated, or, rarely, (2) text segments that may have been relocated.
What happens if we mmap
/proc/self/mem
over the top of an existing mapping?
The short answer is that we get ENODEV
. So it seems copying may be necessary. That is hairy to do from user space but could be done with some carefully crafted, self-contained code.
One way to frame this in a paper might be that the paper is titled something like "Shared objects as shared images: a compatible extension to POSIX dynamic loading interfaces", with improving on the gdb-JIT protocol as one case study with a clear practical value, and a couple of further case studies that get more wacky/speculative. Perhaps some kind of "pre-warmed JIT" approach being the second (read the paper by Jan et al for ideas), and maybe something live-patchy as a third (e.g. can I patch a certain library across all running processes, in a way that continues to share memory? patch in one and then rebind in the others).
Perhaps the core gymnastics is to turn a non-writable MAP_PRIVATE
mapping into a fresh mapping (MAP_SHARED
or MAP_PRIVATE
) of a private file. It might be writable (new stuff can be added, becoming visible to all mappers) or not (just a one-shot live-upgraded version of the existing file). Of course this imports the whole DSU problem as a dependency; we could initially rule out the hard cases, i.e. don't edit or move any function that is active on any call stack.
Another would be an ELF extension that lets us create a program with a pre-existing shared segment, Multics-style. Is this useful? How is this different from linking to a library? Well, that it's not private and therefore "live" not "dead". "Linking to a shared memory segment", linking to a "live" file, etc... these are all new affordances. Many programs morally link to IPC channels, i.e. assume their availability, but this remains implicit ("I happen to pass this string to open()
") rather than explicit ("I depend on an object named foo
of type ...
"). "Live segments" generalise to "live openable DSOs" which is my "everything is a dlopen
" idea again. Also reminded of recent Mastodon discussion about why O-O's "everything is code" can be a bad thing. With live segments we do better: have both manifest state (a memory image) and "code" operations (symbols one can invoke) and can pick the one we want. Note that symbols apply to both: they mark a point/range in the image, and also they (if of STT_FUNC
) denote a receiving point of messages (albeit not technically a 'receiver' in O-O terms).
Is there a unification to be done between PT_LOAD
and DT_NEEDED
here? For a DT_NEEDED
we're saying that we want the segments of that file to be mapped, but we don't care at what address they live (or even how many they are), because we reference them only symbolically (late-bound not pre-bound). Don't forget the idea of DT_USEFUL
which is like a weak symbol.
As another possible case study, I'm reminded of my "self-remaking programs" idea. Perhaps the program would rebuild itself "in memory" as a dlbind image, which remains naturally cached in the filesystem. This is possibly a bit underwhelming / bolted-on -- need to think about how actual value comes from dlbind (since real toolchains, with which we'd presumably do the rebuilding, want to output files, not libdlbind-friendly allocations).
(That should say "most real toolchains" -- I think Zig can already link into memory somehow and ROC has mooted doing similar. Can we adapt [other] real toolchains easily in a way that makes sense here?)