cl-ana
cl-ana copied to clipboard
`cl-ana.reusable-table:reusable-table` can be misleading and lead to surprising bugs
I realize I'm hitting all kinds of edge cases because I'm currently only working with in-memory data sets.
I have introduced bugs into my code a couple times now because I'm forgetting how reusable-table
works. To restate how it works here for benefit of readers: make-reusable-table
takes in a creation-fn
and optionally an opener-form
which tell the table how to recreate the underlying table when it needs to. It then uses a needs-reloading
slot variable to determine when it needs to call these two functions.
The bug I've introduced in my code is when I partially loop through a table, but logically I'm finished with that pass of the table. I think what I should be doing is calling (setf (needs-reloading my-table) t)
, but I am obviously expecting forms like do-table
to do that for me once they exit.
I find myself wondering if a "cursor" concept wouldn't be clearer. E.g. you only ever have 1 representation of the table, but you might create multiple cursors on that table. And do-table
could take in &key (cursor (make-cursor table))
.
Thoughts?
Yes that is how you would reset the reusable-table; I had left this the way it is because I had also left open the possibility to allow the user to catch errors etc. and then simply make another call to do-table
without having to iterate through the entire table again. I can see that there are pros and cons to each way of handling it, and I can see definite benefits of adding the cursor concept.
The trouble I see with cursors in cl-ana is they will need to be implemented per table type. I think they would be very simple for a table that exists strictly in memory and has random-access for rows, but for something like an HDF5 table where rows are read in chunks, I think it would essentially amount to copying all the handles to the underlying HDF5 objects and then emulating the behavior of the original hdf-table object without actually changing it. With reusable-table, I went the route of using a function to create and close table objects rather than creating all these related object types.
Yes that is how you would reset the reusable-table; I had left this the way it is because I had also left open the possibility to allow the user to catch errors etc. and then simply make another call to do-table without having to iterate through the entire table again. I can see that there are pros and cons to each way of handling it, and I can see definite benefits of adding the cursor concept.
I think the "proper" way to do this would be with the condition system and restarts. This should give users the "expected" behavior of do-table
and the like resetting the table when they exit, but allow users to re-enter at a position if needed.
The trouble I see with cursors in cl-ana is they will need to be implemented per table type.
I was assuming that the concept of a cursor is ubiquitous enough that this would not be an issue. As you state later, in-memory is no issue, and any type of single-file based table could just use file pointers.
I think they would be very simple for a table that exists strictly in memory and has random-access for rows, but for something like an HDF5 table where rows are read in chunks, I think it would essentially amount to copying all the handles to the underlying HDF5 objects and then emulating the behavior of the original hdf-table object without actually changing it. With reusable-table, I went the route of using a function to create and close table objects rather than creating all these related object types.
I'm not intimately familiar with how HDF5 data is structured, but this does sound like a pain. Does it not expose any kind of cursor concept for consumers? Is this an edge case that proves the rule?