row-types icon indicating copy to clipboard operation
row-types copied to clipboard

Q: How does Data.Row.Records compare to other extensible record approaches?

Open Wizek opened this issue 5 years ago • 7 comments

For example, does compile time suffer exponentially, having ~ 1+ minute builds past 8-10 fields similar to https://github.com/turingjump/bookkeeper/issues/13 and https://github.com/agrafix/superrecord/issues/12? Or has row-types managed to avoid that particular issue?

If one of you are willing, I'd love to read a comprehensive comparison to some other approaches to find out the strengths and limitations of this implementation. The most efficient way I know how to compare to most other packages at once is to fill in cells in this Google Sheet.

I've added row-types at row 33, under the section titled "Looking for reports:". Is one of you up for filling in some cells there? Maybe @strake, @atzeus, @dwincort, or someone else?

p.s.: I've heard of this package from this talk. Some more background can be gleaned from this reddit thread.

Wizek avatar Jul 25 '18 16:07 Wizek

Hi @Wizek. That's a really great spreadsheet you have going there with a lot of valuable information. I can definitely provide some data for the row-types row, and with just a bit of tweaking, I can update the benchmark suite to include the benchmarks you care about. For the purposes of write permissions on the sheet, my gmail address is dwincort.

I have one question though: How is supercast different from merge?

dwincort avatar Jul 25 '18 18:07 dwincort

Wonderful, thank you. Going to give you edit permissions right away.

Wizek avatar Jul 25 '18 20:07 Wizek

@strake Huh, I was just playing around with some benchmarks, and while I was right that making records is O(n) compile time, metamorphing them seems to be O(n^2). In particular, the use of FoldStep has terrible compile-time performance repercussions, indicating that we may want to get rid of it (that's not actually hard to do, but it makes user-created metamorphs near impossible without some internal "unsafe" functions). A use of metmorph with a 64-field record takes nearly 10 minutes to compile on my machine (funny enough, it doesn't seem to matter how many uses of metamorph are in the module), and removing FoldStep brings that down to 90 seconds. I mean, 90 seconds is still forever -- if I just declare row-types, construct values in the regular way, and do standard reads, writes, appends, etc., it compiles in just a few seconds, even if I have 64-field records. Runtime-wise, metamorph is fast (or, at least, in line with my expectations), but when we get some time, we should look into finding a way to make it compile quickly too.

Note: I think FoldStep is doing a type equality check on each partial row-type-list, which means it's doing a quadratic number of equality checks. Really, we only need to check the first element of the row-type-list. Maybe we can use some sort of entailment and under-the-hood coercion to trick GHC into giving the right behavior without making it do all the work.

dwincort avatar Jul 25 '18 20:07 dwincort

Oh, and about your supercast vs merge question. We have that distinction in there because even though for most approaches these are the same, not for all of them. And the philosophy behind this sheet is that in that case we differentiate by splitting columns. Case in point: generic-lens has supercast, and last I checked it doesn't have a notion of merge. But if you/we find that it's the same for all rows, then of course we can merge the two columns.

Wizek avatar Jul 25 '18 21:07 Wizek

Furthermore, thank you again for filling in as many cells as you have so far. I'm quite impressed by how many fields are green. I might want to give this package a spin based on this! Although that 90..600 second compile time sounds more troublesome, I don't yet know enough of what metamorph and FoldStep are supposed to do in this context, so maybe I can live without them.

But if my experiments point towards that I can't, I am a bit worried that there may be something more fundamental at play here, because the superrecord, bookkeeper, rawr and even some scala folks have all run into similar compile time performance issues. We'll see.

So thanks, and please also feel free to fill in or correct cells in other rows as well; you may know much more about CTRex for example than any of our editors did so far.

Wizek avatar Jul 25 '18 21:07 Wizek

Feel free to play around with the Examples module and the benchmark suite if you'd like to see some of this in action. I'm going to try to update the benchmarks properly, but I have a lot on my plate right now and a vacation coming up soon that's acting as a pretty big deadline. Do let me know if there's any documentation you don't understand.

As for metamorph, it's the driving function behind operations over an entire row-type object at once. It drives the action of: converting a row-types type into a record, converting from native Haskell records and back, mapping, sequencing, showing, etc. Operations that target individual fields don't need it, but everything else is built on top of metamorph.

FoldStep is a constraint that operates on the underlying implementation of row-types. If you think of the row-type as a sorted list (which it is), then FoldStep says that the label you're adding must be smaller than anything else in the list. This means that if you are performing one "step" of a "fold" (which is part of metamorph), this constraint allows you to use the standard row-types append/inject/.+ operator. Without it, you need to use an unsafe version of said operator. In practice, the unsafe version is faster at runtime too, so this constraint is not actually used by any of the metamorph-using functions in the library. That said, if you wanted to write your own function that used metamorph, you would be very limited unless you had either FoldStep or the unsafe injection operator.

Actually, we've run into issues in the past where FoldStep still didn't give us exactly what we wanted, and I've had my eye on fixing it to something better anyway. This compile-time performance issue is really just the impetus to get me to find a fix. ;)

dwincort avatar Jul 25 '18 21:07 dwincort

row-types started as a fork of CTRex, so I may be able to fill in some details on that line.

dwincort avatar Jul 25 '18 21:07 dwincort