cl-ana `cl-ana.reusable-table:reusable-table` can be misleading and lead to surprising bugs

I realize I'm hitting all kinds of edge cases because I'm currently only working with in-memory data sets.

I have introduced bugs into my code a couple times now because I'm forgetting how reusable-table works. To restate how it works here for benefit of readers: make-reusable-table takes in a creation-fn and optionally an opener-form which tell the table how to recreate the underlying table when it needs to. It then uses a needs-reloading slot variable to determine when it needs to call these two functions.

The bug I've introduced in my code is when I partially loop through a table, but logically I'm finished with that pass of the table. I think what I should be doing is calling (setf (needs-reloading my-table) t), but I am obviously expecting forms like do-table to do that for me once they exit.

I find myself wondering if a "cursor" concept wouldn't be clearer. E.g. you only ever have 1 representation of the table, but you might create multiple cursors on that table. And do-table could take in &key (cursor (make-cursor table)).

Thoughts?

May 15 '19 19:05 kat-co

Yes that is how you would reset the reusable-table; I had left this the way it is because I had also left open the possibility to allow the user to catch errors etc. and then simply make another call to do-table without having to iterate through the entire table again. I can see that there are pros and cons to each way of handling it, and I can see definite benefits of adding the cursor concept.

May 25 '19 03:05 ghollisjr

The trouble I see with cursors in cl-ana is they will need to be implemented per table type. I think they would be very simple for a table that exists strictly in memory and has random-access for rows, but for something like an HDF5 table where rows are read in chunks, I think it would essentially amount to copying all the handles to the underlying HDF5 objects and then emulating the behavior of the original hdf-table object without actually changing it. With reusable-table, I went the route of using a function to create and close table objects rather than creating all these related object types.

May 25 '19 03:05 ghollisjr

Yes that is how you would reset the reusable-table; I had left this the way it is because I had also left open the possibility to allow the user to catch errors etc. and then simply make another call to do-table without having to iterate through the entire table again. I can see that there are pros and cons to each way of handling it, and I can see definite benefits of adding the cursor concept.

I think the "proper" way to do this would be with the condition system and restarts. This should give users the "expected" behavior of do-table and the like resetting the table when they exit, but allow users to re-enter at a position if needed.

The trouble I see with cursors in cl-ana is they will need to be implemented per table type.

I was assuming that the concept of a cursor is ubiquitous enough that this would not be an issue. As you state later, in-memory is no issue, and any type of single-file based table could just use file pointers.

I think they would be very simple for a table that exists strictly in memory and has random-access for rows, but for something like an HDF5 table where rows are read in chunks, I think it would essentially amount to copying all the handles to the underlying HDF5 objects and then emulating the behavior of the original hdf-table object without actually changing it. With reusable-table, I went the route of using a function to create and close table objects rather than creating all these related object types.

I'm not intimately familiar with how HDF5 data is structured, but this does sound like a pain. Does it not expose any kind of cursor concept for consumers? Is this an edge case that proves the rule?

Jun 07 '19 15:06 kat-co

cl-ana cl-ana copied to clipboard

`cl-ana.reusable-table:reusable-table` can be misleading and lead to surprising bugs

cl-ana
cl-ana copied to clipboard