DataToolkit.jl
DataToolkit.jl copied to clipboard
Multi-file loader
As recently discussed on Zulip, it would be nice to have a loader which allows loading multiple files that have the same schema, which is already supported by e.g. CSV.jl or Arrow.jl. So I thought I'd make an issue to track this :)
Thanks for the issue, it will probably take a while for me to get to this properly, but for the record this is rolling around in the back of my mind.
I want to handle this, but also handle it properly (use a cached merkle-tree hash for starters, but more thought is needed).
I'm thinking more on this, and specifically having a directory. I'm wondering if introducing a DirPath as a counterpart to FilePath could be a good way of handling this.
That sounds sensible. Would you then chain a directory loader and a specific file loader? Or would you just pass the directory to a loading function which is then free to process its contents in any way?
We now have DirPath! :partying_face:
This is a big step, and it's been done properly: merkle tree hashing for integrity, with caching to avoid long waits for repeated work on each access/check.
Now we have an easy way to arrive at a collection of items, we can start thinking about the next step: how to handle them in bulk...