souffle ADT Representation

ADT Representation

Open b-scholz opened this issue 2 years ago • 4 comments

A single ADT branch uses two records at the moment, i.e., we have following encoding.

.type B = b1 { fields1...} | b2 { fields2 ..} | ...

A branch is encoded as [<branch-id> [fields_i]] using two records. The logic behind this encoding is that no memory space is wasted. However, it comes not for free - we have a double-unpack for records (via their ordinal numbers) to access the field data.

An alternative way to represent ADT is to use a single record of the format: [<branch-id> e1 ... en] where <branch-id> is the identification of the branch followed by the fields. The number of elements is determined by the longest branch. If branches require less elements, the unused elements are just padded with zero values.

Using an alternative encoding may speed up some applications using ADTs.

Aug 31 '21 01:08 b-scholz

In any case it would then be nice to have a simple way to construct values of an ADT using the Souffle API. Currently we have to do a little bit of reverse engineering of the ADT representation encoding into records and ram-values in order to create values:

if the ADT is an enum:
  return the branch index as a ram value
else if the branch is an empty record
  return the record [branch-index, 1]
else if the branch is a single element record
  return the record [branch-index, element-value]
else if the branch has is a multiple elements record
  return the record [branch-index, [multiple elements record]]

Sep 13 '21 07:09 quentin

Indeed, I did not list the special cases for the zero and one arity case.

Sep 13 '21 07:09 b-scholz

Having a flat representation should be less memory consuming in the worst case when all the produced values for an ADT have the longest branch(es) form, because it saves one record indirection.

I am personally in favour of simpler, flat ADT encoding. Saving a pack/unpack operation should have a far more positive impact compared to the little negative impact of using more memory in average.

Sep 13 '21 07:09 quentin

The mapping to pack/unpack is encapsulated in the ast2ram translator. Thus, only a few lines in the code need to be changed. The question is whether we would like to keep both encodings or make everything flat.

Ideally, we do some experiments to decide which encoding is better in practice.

Sep 13 '21 07:09 b-scholz

souffle souffle copied to clipboard

ADT Representation

souffle
souffle copied to clipboard