dex-lang icon indicating copy to clipboard operation
dex-lang copied to clipboard

Optimize sum-type storage

Open apaszke opened this issue 4 years ago • 1 comments

Right now, when we lower sum types they (roughly) get expanded into a flattened product of all fields appearing in all cases, with only the storage corresponding to the active case actually getting initialized and used at run-time.

But this approach is highly wasteful! In particular an ADT such as

data ColorfulFloat = 
  RedFloat Float
  BlueFloat Float
  GreenFloat Float

should only use 5 (1+4) bytes of storage, not 13 (1+4+4+4)!

To improve this situation, we need to modify the Imp codegen to reuse storage already allocated for other cases while emitting the destinations for sum types.

apaszke avatar Jan 04 '21 18:01 apaszke

With the recent rewrite of the Imp lowering to use type-determined representations (#1173), this is shouldn't be too hard.

The runtime representation of simplified types (base types, sums, products, dependent pairs) should be entirely handled in Imp.hs in three functions: typeToTree, valueToTree, and atomToRepVal. We probably violate the abstraction a little bit -- e.g. I notice that we use the Branch constructor in the Case case of translateExpr in a sum-type-specific way too -- but those are the main ones.

The runtime representation can be an arbitrary function of the simplified type. Currently, we represent a sum type, (t1 | t2 | ...) as a product of a tag and a product of the components of the sum: (t1 & t2 & ...). But we could change that to e.g. a map from each base type to a power of that type according to the max number of occurrences across each sum type component.

dougalm avatar Feb 15 '23 19:02 dougalm