DataSets.jl icon indicating copy to clipboard operation
DataSets.jl copied to clipboard

Remote drivers

Open yakir12 opened this issue 4 years ago • 5 comments

I love everything about this package and how it treats data. But it is really hard to incorporate for non-local data. Is there a sanctioned way to deal with data hosted on, say, AWS S3 or similar?

yakir12 avatar Oct 27 '21 11:10 yakir12

Thanks. Basically we'll need storage drivers for each type of remote data API. So far I haven't had time to make a solid implementation of this, but there was some discussion in #26 related to a Downloads-based backend.

One fairly easy option for cloud storage is to use rclone mount (via Rclone_jll) to mount an S3 prefix. I've got a prototype of that kind of thing lying around separately.

c42f avatar Nov 17 '21 05:11 c42f

Btw, on master Downloads.jl is now thread-safe. I know that was a blocker in various use-cases.

StefanKarpinski avatar Nov 17 '21 15:11 StefanKarpinski

Yes that's great news! I've been following along with some of the Downloads changes but I got a little lost in the list back and forth. It seemed like a difficult fix.

c42f avatar Nov 18 '21 03:11 c42f

It was painful. Hopefully it's actually thread-safe now.

StefanKarpinski avatar Nov 18 '21 16:11 StefanKarpinski

Any new and exciting development on this front? Some of the online sources would be AWS, FigShare, Zenodo, and growing...

yakir12 avatar May 31 '22 11:05 yakir12