Downloads.jl icon indicating copy to clipboard operation
Downloads.jl copied to clipboard

Using `easy_hook` for per-request parameters

Open c42f opened this issue 3 years ago • 7 comments

Currently the easy_hook function gets passed an info parameter with URL, method and headers:

https://github.com/JuliaLang/Downloads.jl/blob/e22219f6cf8e474ace8624534188f23858e56779/src/Downloads.jl#L341-L342

But I can't see how to use this to set options on a per-request basis as HTTP.jl currently allows for things like the connection timeout (https://curl.se/libcurl/c/CURLOPT_CONNECTTIMEOUT.html). Ideally we'd add connection timeouts and some other obviously useful parameters (will post a separate issue about those), but to cover more unforeseen settings it would be helpful if there was some way to get per-request items into the info struct.

I'm not sure what's best for the API. A couple of possible ideas:

  • Add an easy_hook_opts::Any=nothing keyword to request, and just pass this through in the info tuple.
  • Add an easy_hook::Union{Function,Nothing}=nothing keyword to request and invoke this on the easy handle (maybe in addition to the easy hook which is contained in the Downloader).

I kind of favor the second of these ideas. Perhaps not as a public part of the API, but as an escape hatch it sure would be handy.

c42f avatar Aug 21 '21 06:08 c42f

Another example of this: AWS.jl has a new Downloads-based backend (https://github.com/JuliaCloud/AWS.jl/pull/396) but they needed to set Curl.CURLOPT_FOLLOWLOCATION to false. Currently they're accepting a potential race condition in one part of the code to set the easy_hook on the downloader: https://github.com/JuliaCloud/AWS.jl/pull/396#discussion_r688658942

c42f avatar Aug 24 '21 11:08 c42f

For the redirects case we ran into in AWS.jl, I think it could make sense to expose a redirect keyword like HTTP.jl does, and then set the hook accordingly, to expose a slightly nicer API over the lower-level Curl options.

But for the general case, I agree it would be good to be able to configure this on a per-request basis. My understanding is you often do want a shared downloader object in order to re-use connections and things like that, which is why the possible solution of "just make a new downloader globally configured how you want" isn't ideal in general. So I think it would be good to be able to configure this on a per-request level.

ericphanson avatar Aug 24 '21 12:08 ericphanson

For per-request modification of the easy handle, it doesn't really make sense to pass in any info since your callback is specific to one download. It also doesn't make sense to try to squeeze info through keywords. It would make more sense to provide a callback that can arbitrarily modify the easy handle. E.g. something like this:

download(url) do easy
    Curl.setopt(easy, Curl.CURLOPT_WHATEVER, value)
end

I don't know if it's worth using the do block syntax for this, but alternatively one could do this:

download(url, easy_hook = easy->Curl.setopt(easy, Curl.CURLOPT_WHATEVER, value))

This raises the question: is this callback called in addition to the easy hook that's registered with the Downloader object or instead of it? In general, the easy hook business doesn't compose in the sense that you can't add these options from multiple places without them clobbering each other. The composable version would be to have a stack of easy hooks and calling them all in this order:

  1. Create the easy handle
  2. Apply global easy hook callbacks in order
  3. Apply downloader easy hook callbacks in order
  4. Apply the request easy hook callback
  5. Do the request.

That's a change to how the easy hooks work; maybe we'd want to make it an official API.

StefanKarpinski avatar Aug 26 '21 16:08 StefanKarpinski

Apply downloader easy hook callbacks in order

I think this can already be done in a round about way — if the user has a Downloader, the can install an easy_hook closure which internally uses a stack of options or functions to apply.

c42f avatar Sep 21 '21 07:09 c42f

Yes, but the point is that if multiple independent parties have set easy hooks, currently it will break. Instead, we would like for that to work.

StefanKarpinski avatar Sep 21 '21 15:09 StefanKarpinski

Yes, but the point is that if multiple independent parties have set easy hooks, currently it will break. Instead, we would like for that to work.

Can't independent parties just use their own Downloader instances to get a well-defined set of curl options? Some subsets of options are interdependent so it doesn't seem very composable to have multiple parties setting them.

For the global Downloader instance I feel like that can only be safely modified at the whole-application level — if several libraries modify it they can easily clash in non-composable ways. So it may be adequate to have a single easy_hook for use by the application.

I could be persuaded otherwise. Do you have a particular use case in mind where it would be useful to compose global options without a central authority (ie, the application) knowing the complete list?

c42f avatar Sep 22 '21 03:09 c42f

Right, but it's fairly common for someone to want to set libcurl options that will affect the global downloader because they're needed to make download operations done somewhere else work. Suppose, for example, someone wants to inject some auth header when making requests to certain hosts. Someone else wants to set some other option. That could be made to work. Obviously, if you're in a position to inject a custom downloader into the download operation, that's better, but that's not always the case. Being able to set options on a single download is sufficient for anything if you control the download code but that's not always the case.

StefanKarpinski avatar Sep 23 '21 14:09 StefanKarpinski