Downloads.jl
Downloads.jl copied to clipboard
Using `easy_hook` for per-request parameters
Currently the easy_hook
function gets passed an info
parameter with URL, method and headers:
https://github.com/JuliaLang/Downloads.jl/blob/e22219f6cf8e474ace8624534188f23858e56779/src/Downloads.jl#L341-L342
But I can't see how to use this to set options on a per-request basis as HTTP.jl currently allows for things like the connection timeout (https://curl.se/libcurl/c/CURLOPT_CONNECTTIMEOUT.html). Ideally we'd add connection timeouts and some other obviously useful parameters (will post a separate issue about those), but to cover more unforeseen settings it would be helpful if there was some way to get per-request items into the info
struct.
I'm not sure what's best for the API. A couple of possible ideas:
- Add an
easy_hook_opts::Any=nothing
keyword torequest
, and just pass this through in theinfo
tuple. - Add an
easy_hook::Union{Function,Nothing}=nothing
keyword torequest
and invoke this on the easy handle (maybe in addition to the easy hook which is contained in theDownloader
).
I kind of favor the second of these ideas. Perhaps not as a public part of the API, but as an escape hatch it sure would be handy.
Another example of this: AWS.jl has a new Downloads-based backend (https://github.com/JuliaCloud/AWS.jl/pull/396) but they needed to set Curl.CURLOPT_FOLLOWLOCATION
to false. Currently they're accepting a potential race condition in one part of the code to set the easy_hook
on the downloader: https://github.com/JuliaCloud/AWS.jl/pull/396#discussion_r688658942
For the redirects case we ran into in AWS.jl, I think it could make sense to expose a redirect
keyword like HTTP.jl does, and then set the hook accordingly, to expose a slightly nicer API over the lower-level Curl options.
But for the general case, I agree it would be good to be able to configure this on a per-request basis. My understanding is you often do want a shared downloader object in order to re-use connections and things like that, which is why the possible solution of "just make a new downloader globally configured how you want" isn't ideal in general. So I think it would be good to be able to configure this on a per-request level.
For per-request modification of the easy
handle, it doesn't really make sense to pass in any info since your callback is specific to one download. It also doesn't make sense to try to squeeze info through keywords. It would make more sense to provide a callback that can arbitrarily modify the easy
handle. E.g. something like this:
download(url) do easy
Curl.setopt(easy, Curl.CURLOPT_WHATEVER, value)
end
I don't know if it's worth using the do block syntax for this, but alternatively one could do this:
download(url, easy_hook = easy->Curl.setopt(easy, Curl.CURLOPT_WHATEVER, value))
This raises the question: is this callback called in addition to the easy hook that's registered with the Downloader
object or instead of it? In general, the easy hook business doesn't compose in the sense that you can't add these options from multiple places without them clobbering each other. The composable version would be to have a stack of easy hooks and calling them all in this order:
- Create the easy handle
- Apply global easy hook callbacks in order
- Apply downloader easy hook callbacks in order
- Apply the request easy hook callback
- Do the request.
That's a change to how the easy hooks work; maybe we'd want to make it an official API.
Apply downloader easy hook callbacks in order
I think this can already be done in a round about way — if the user has a Downloader
, the can install an easy_hook
closure which internally uses a stack of options or functions to apply.
Yes, but the point is that if multiple independent parties have set easy hooks, currently it will break. Instead, we would like for that to work.
Yes, but the point is that if multiple independent parties have set easy hooks, currently it will break. Instead, we would like for that to work.
Can't independent parties just use their own Downloader
instances to get a well-defined set of curl options? Some subsets of options are interdependent so it doesn't seem very composable to have multiple parties setting them.
For the global Downloader
instance I feel like that can only be safely modified at the whole-application level — if several libraries modify it they can easily clash in non-composable ways. So it may be adequate to have a single easy_hook
for use by the application.
I could be persuaded otherwise. Do you have a particular use case in mind where it would be useful to compose global options without a central authority (ie, the application) knowing the complete list?
Right, but it's fairly common for someone to want to set libcurl options that will affect the global downloader because they're needed to make download operations done somewhere else work. Suppose, for example, someone wants to inject some auth header when making requests to certain hosts. Someone else wants to set some other option. That could be made to work. Obviously, if you're in a position to inject a custom downloader into the download operation, that's better, but that's not always the case. Being able to set options on a single download is sufficient for anything if you control the download code but that's not always the case.