processing urls in batches
Basic summary
As a beginner in Rust, I would like to add to this thread with our real-life experience. We are currently facing issues which make me relate to this story (and are preventing us to switch to Rust):
We are trying to rewrite some of our services from Python to Rust and are looking to achieve the following:
- Read a bunch of URLs (size varies, but about 1000 per batch)
- Do an HTTP GET request for each URL asynchronously
- Log the failures and process the results
What we did not succeed to do so far is:
- Send the requests by batch. If we send the 1000 requests at the same time,
our server closes the connection and the process panics. Ideally we could
buffer them to send at most 50 at a time. We could split the batches manually,
but we hoped the HTTP client or the
FuturesUnorderedcontainer would handle that for us. - Handle errors. Failures should be logged and should not crash the process. We plan on using tracing-rs for the logging as it is part of the tokio stack.
- Implement Fibonacci or exponential retry mechanism on failure.
For reference, the stackoverflow question where I was looking for help.
Originally posted by @rgreinho in https://github.com/rust-lang/wg-async-foundations/issues/95#issuecomment-811397783
Can you say a bit more, @rgreinho? What caused you not to succeed, for example?
We're still investigating, so hopefully we will find solutions soon :)
But here are our two big blockers so far:
-
Not sure how to limit the number of requests sent at the same time asynchronously. In Python the aiohttp web client can be configured with a smaller connection pool size and handles it for us. We hoped that reqwest would do the same. Note that we're not specifically attached to
reqwestand could usesurforhreq, but unfortunately we found out that they behave the same way.Our second hope was that the FuturesUnordered container would allow us to manage this.
A comment in the SO question pointed us to this question, where they create a
Streamand apply the buffer_unordered() method on it. That will be our next attempt.Coming from a Python background, we hoped we could simply use the same pattern, where the asyncio.gather() function from the stdlib executes the coroutines and collects the results, that's why we went with the
FuturesUnorderedcontainer first. -
We do not know how to retry failed requests. We did find the backoff crate and the tokio-retry one, but they don't seem to work well with
FuturesUnordered. Or at least we did not succeed to get them to work together.In Python we use tenacity to decorate our functions, and if an exception is caught, it tries to re run them for us.
We are also having problems with the error handling. We could not get the map_err() and or_else() method to work as expected. But this is probably simply do to the fact that we're new to this and did not use them properly. I'm sure we will figure it out soon. Same thing with the logging, the tracing-rs library looks fantastic.
@rgreinho any chance you want to join a "vision doc writing session" and talk about this? What time zone are you in? :)
This week's writing sessions -- I expect we'll schedule more for next week.
Sure thing! Not sure how that works, or what exactly is expected from participants, but I'll be glad to help.
I am in the CDT time zone.
The basic format is that the host asks you (and others) a bunch of questions about your experiences and then we try to collectively write a story about one of the characters. It's fun. :)