DataSets.jl
DataSets.jl copied to clipboard
Remote drivers
I love everything about this package and how it treats data. But it is really hard to incorporate for non-local data. Is there a sanctioned way to deal with data hosted on, say, AWS S3 or similar?
Thanks. Basically we'll need storage drivers for each type of remote data API. So far I haven't had time to make a solid implementation of this, but there was some discussion in #26 related to a Downloads-based backend.
One fairly easy option for cloud storage is to use rclone mount (via Rclone_jll) to mount an S3 prefix. I've got a prototype of that kind of thing lying around separately.
Btw, on master Downloads.jl is now thread-safe. I know that was a blocker in various use-cases.
Yes that's great news! I've been following along with some of the Downloads changes but I got a little lost in the list back and forth. It seemed like a difficult fix.
It was painful. Hopefully it's actually thread-safe now.
Any new and exciting development on this front? Some of the online sources would be AWS, FigShare, Zenodo, and growing...