CSV.jl Read CSV directly from URL

This was mentioned in #3, but that was ~4 years ago. Not sure it's worth taking a dependency on HTTP.jl, but being able to do CSV.read(url) would be really great.

Oct 01 '19 15:10 kescobo

There has been interest generally to move MbedTLS and HTTP.jl as stdlibs, where it would be much easier to support this w/o having to take on extra dependencies.

Oct 01 '19 15:10 quinnj

I've added an example to the CSV.File/CSV.Rows docs on how to do this very simply (i.e. CSV.File(HTTP.get(url).body)), and if HTTP.jl ever becomes a stdlib, we can for sure support it natively.

Jun 26 '20 04:06 quinnj

That seems like a good compromise 👍

Jun 26 '20 16:06 kescobo

Hi, I come from Python, in pandas they have read_csv() built-in function which really convenience to read csv file directly from URL.

In the perspective of Julia, using CSV.File(HTTP.get(url).body) a lot of time is quite redundant, verbose and not beginner-friendly. More than that, when people create a notebook to read csv and do they stuff, it seems csv only need to read one or two time in that instance. Therefore, the problem in here is adding HTTP.jl dependency which user rarely use, I believe CSV.jl can have its own small feature to reduce the size of dependency problem. It's really good if this issue can be open again to see people thought about this.

Jul 24 '23 07:07 hungpham3112

@hungpham3112 For packages like this, keeping really lean is important. I don't know that building a bespoke URL is worth the added development / maintenance burden.

That said, I wonder if a package-extension for people that also load HTTP.jl might be possible.

There has been interest generally to move MbedTLS and HTTP.jl as stdlibs, where it would be much easier to support this w/o having to take on extra dependencies.

Given this was 4 years ago, and the general trend lately is rather taking things out of stdlib, it might be worth revisiting

Jul 24 '23 14:07 kescobo

Given this was 4 years ago, and the general trend lately is rather taking things out of stdlib, it might be worth revisiting

I think it's worth reopening to see what people think.

Aug 02 '23 02:08 hungpham3112

I think if someone was up for it, the best path forward would be to use the Downloads stdlib. This would involve modifying the getbytebuffer function in the utils.jl file to do a regex match against the source to see if it's a URL, then using Downloads.jl to download the url into memory (but respecting the buffer_in_memory keyword arg) or to disk, then letting the rest of the read process continue normally. It's a bit tricky to add tests that rely on networking, but Base has set up a go port of httpbin.org that they've said we can use in the HTTP.jl package for tests. So you could write a test that hits one of those endpoints (https://httpbingo.julialang.org/) and have it return some csv data and then read the file from that. It'd also be good to test that gzipped csv data from a url is handled correctly.

Anyone want to take a stab at it?

Aug 02 '23 03:08 quinnj

Just chiming in to say that this would be a very nice feature to have to make Julia more accessible

Jun 15 '24 11:06 DominiqueMakowski

This would be a place that FilePathsBase.jl integration could work nicely, if there were more support for web paths.

Sep 30 '24 21:09 asinghvi17