compact icon indicating copy to clipboard operation
compact copied to clipboard

Weaken Onerous Serialization Restrictions

Open andrewthad opened this issue 5 years ago • 5 comments

Despite my excitement about compact regions ever since it came out, I’ve never actually been able to use it for serialization. The restriction that the server and client be running the exact same binary is huge. It means that you cannot use compact regions to serialize to something like a message bus, where the producer and consumer are almost never running the same code.

What would it take to weaken this requirement, and what would the weaker requirement be? I have pondered a different scheme in which dynamic linking all haskell dependencies is mandatory. The requirement to read a serialized compact regions would be that the types used in the compacted value must come from a shared object that is the same on both systems. That is, the user just have to make sure both applications are dynamically linked to the exact same my-shared-types library.

I suspect that this would be difficult to implement. But I just wanted to get the idea down. I’ve been reading a lot about Cap’n Proto and Flatbuffers lately, and I had started thinking about how great it would be if I could just use the ol’ faithful haskell type system instead of a restricted subset of ADTs (and no GADTs at all)

andrewthad avatar Jun 06 '19 02:06 andrewthad

Pondering this a little more last night, I began wondering what exactly are the things that a value in a compact region are allowed to point to that are outside of its compact region? At the least, it must include the info table. But is that all? What about a CAF? If it’s just the info tables, it might be possible to build a map on the sending side that associates info table pointers with typerep. And then maybe you could clean up those pointers on the receiving end.

andrewthad avatar Jun 06 '19 11:06 andrewthad

Thinking about this even more, I wonder if it's possible to perform the additional translation step as a separate serialization with the introduction of a variant of compactFixupPointers#. The idea would be to additionally serialize something of type Map (Ptr StgInfoTable) Fingerprint before sending the actual compact region. Then, on the receiving end, library code would figure out how to resolve these fingerprints to the Ptr StgInfoTable on the receiving host (not totally sure on how this mechanism would work). Some variant of compactFixupPointers# would additionally modify the info table pointers as it passed over the data.

andrewthad avatar Jun 06 '19 12:06 andrewthad

In rts/sm/CNF.c, the following restrictions are listed:

Invariants

  1. A CNF is self-contained. The data within it does not have any external pointers. EXCEPT: pointers to static constructors that are guaranteed to never refer (directly or indirectly) to CAFs are allowed, because the garbage collector does not have to track or follow these.
  2. A CNF contains only immutable data: no THUNKS, FUNs, or mutable objects. This helps maintain invariant (1).

I had forgotten about static data constructors. These would need to be cleaned up on the receiving end as well.

andrewthad avatar Jun 06 '19 13:06 andrewthad

I did not understand what the term "static constructor" actually meant. I was thinking it was a nullary data constructor. I suspect that the static constructor optimization that CNF allocation does would need to be disabled for what I am proposing to be able to work correctly.

andrewthad avatar Jun 06 '19 14:06 andrewthad

It might also be possible to fix up those pointers too. But the first problem you have to solve is coming up with a stable name for these things, that is preserved across different binaries...

ezyang avatar Jun 11 '19 18:06 ezyang