dex-lang icon indicating copy to clipboard operation
dex-lang copied to clipboard

[WIP] Relational Algebra example.

Open duvenaud opened this issue 5 years ago • 2 comments

In principle Dex's record system could use tables of records to do the sorts of data-munging usually done by packages such as pandas.

In order to match the standard relational algebra, there are a couple of things that I don't think can be expressed in the current type system:

  1. Some functions need to flatten their records in order to match their standard definitions, but I don't know how to type this. I think something like this would suffice: def combine_records (a:{&...l}) (b:{&...r}): ({& ...l ...r}) = todo but I don't think the current syntax supports this. Any thoughts, @danieldjohnson ?
  2. To rename a field, one needs to manually create a renaming isomorphism. This isn't bad but it's a little boilerplate-y.
  3. We still need to write our own equality instances for records, as mentioned in #258.
  4. Equality instances for records wouldn't be a problem if there was some sort of isomorphism that let one access the data of an arbitrary record.

Another missing feature (as discussed with @apaszke and @dougalm is the ability to create custom index sets. Ideally, instead of tracking primary keys, one would be able to re-index a table of records by one of its columns. Something like:

def indexby (_: Eq b) ?=>
  (field : Iso ({&} & a) (b & c))
  (table:n=>a) : (elementsOfColumn table b)=>c = todo
  
elementsOfColumn table b (_: Eq b) ?=>
  (field : Iso ({&} & a) (b & c))
  (table:n=>a) : Type = todo

but I don't know how to write elementsOfColumn to create a new index set.

duvenaud avatar Dec 29 '20 20:12 duvenaud

Yeah, the current syntax doesn't support record flattening. I think it's actually a more fundamental issue than that, too, because the concatenation of two records can't be unified with a concrete record type in a unique way (since the fields could come from either l or r). That means that a type like {& ...l ...r} probably couldn't directly be used as a type in the core IR during type inference (under the current design of type inference), and would have to be handled with some other machinery (like typeclass dictionary synthesis, perhaps).

(I think @dougalm and I had a conversation a few months ago where we were wondering if there were any practical use cases for record concatenation. Looks like this may be one!)

My not-fully-thought-through hypothesis is that record concatenation and record equality are both special cases of making it possible to iterate over the fields of arbitrary records in user-space, which could be accomplished by somehow transforming records into a generic representation. But I'm still not sure how much work that would be; it's something I'm considering playing around with in the future but I don't know when I'll get around to it.

Similar to the manual renaming isomorphism, it should be possible right now to manually write out a concatenation for two tables by providing an isomorphism that lists all of the fields from l and moves them one by one into r. But admittedly that's a pretty annoying thing to have to do.

danieldjohnson avatar Dec 29 '20 22:12 danieldjohnson

Thanks for your thoughts, @danieldjohnson !

Yeah, the current syntax doesn't support record flattening. I think it's actually a more fundamental issue than that, too, because the concatenation of two records can't be unified with a concrete record type in a unique way (since the fields could come from either l or r). That means that a type like {& ...l ...r} probably couldn't directly be used as a type in the core IR during type inference (under the current design of type inference), and would have to be handled with some other machinery (like typeclass dictionary synthesis, perhaps).

I suspected as much, or you would have included an example of it in isomorphisms.dx :)

(I think @dougalm and I had a conversation a few months ago where we were wondering if there were any practical use cases for record concatenation. Looks like this may be one!)

Yes, although we should keep in mind that the main use case here, join, might often just be an awkward way of composing indexed tables. A long time ago @dougalm sketched out for me some examples where complicated lookups could be simply (and type-safely) expressed through nested indexing. I think the next step should be to come up with a few more concrete use cases we'd like to be able to support. It might turn out that the relational algebra can be usefully refactored!

My not-fully-thought-through hypothesis is that record concatenation and record equality are both special cases of making it possible to iterate over the fields of arbitrary records in user-space, which could be accomplished by somehow transforming records into a generic representation. But I'm still not sure how much work that would be; it's something I'm considering playing around with in the future but I don't know when I'll get around to it.

I also am getting the sense that moving things to a user-manipulable representation is going to be necessary for usability. However, I've been surprised by how much you've managed to be able to do with the current representation, so I'm not sure. Part of the motivation for this example was for me to see how far I could push this isomorphism framework, and in this case I was able to push it a little further than I thought, but it looks like I'm hitting some real limitations.

Similar to the manual renaming isomorphism, it should be possible right now to manually write out a concatenation for two tables by providing an isomorphism that lists all of the fields from l and moves them one by one into r. But admittedly that's a pretty annoying thing to have to do.

Right, I found that example in isomorphisms.dx. But yes, if anyone should be able to avoid boilerplate, it's languages with dependent types!

duvenaud avatar Dec 29 '20 23:12 duvenaud