rust-esplora-client icon indicating copy to clipboard operation
rust-esplora-client copied to clipboard

fix(esplora-client): implement retryable calls for blocking and async clients

Open e1a0a0ea opened this issue 1 year ago • 14 comments

Abstract

This pull request addresses the 429 HTTP error that can occurs when syncing big/multiple wallets to a remote a node with tight rate limit. It aims at slowing down requests made, either enforcing retry-after returned through header or exponentially increasing a backoff timer.

Changes Made

  • Create 2 traits: AsyncRetryable and SyncRetryable
  • impl AsyncRetryable for AsyncClient
  • impl SyncRetryable for SyncClient
  • use pool_max_idle_per_host to fix hyperium error when sending several parallel requests

Testing

  • Build tested on wasm target
  • Manual test on a CLI tool + web UI

Related Issues

Fixes bdk/issues/1120

Dev notes

  • Addition of async-std external crate: should not break target builds
  • Arguments in both traits (retryable_error_codes & max_retries) can be removed since using hardcoded constants might make the code slightly more readable, even though less flexible

e1a0a0ea avatar Jan 11 '24 09:01 e1a0a0ea

Thanks for this work. I think this will be an important feature to have since we have been running into 429 very often with the bdk_esplora crate.

A few initial thoughts/question:

  1. Is async_std the right dependency to add? What are the alternative choices?
  2. Would there be a situation when a user would not want this dependency/feature? Should we have a feature flag to turn it off?

evanlinjin avatar Jan 11 '24 12:01 evanlinjin

This doesn't seem to build for me as you're trying to use async traits:

error[E0562]: `impl Trait` only allowed in function and inherent method return types, not in trait method return types
  --> src/retryable.rs:31:10
   |
31 |     ) -> impl Future<Output = Result<R, E>>;
   |          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   |
   = note: see issue #91611 <https://github.com/rust-lang/rust/issues/91611> for more information

error[E0706]: functions in traits cannot be declared `async`
  --> src/retryable.rs:82:5
   |
82 |       async fn exec_with_retry(
   |       ^----
   |       |
   |  _____`async` because of this
   | |
83 | |         self,
84 | |         retryable_error_codes: Vec<u16>,
85 | |         max_retries: Option<u32>,
86 | |     ) -> Result<ReqwestResponse, crate::Error> {
   | |______________________________________________^
   |
   = note: `async` trait functions are not currently supported
   = note: consider using the `async-trait` crate: https://crates.io/crates/async-trait
   = note: see issue #91611 <https://github.com/rust-lang/rust/issues/91611> for more information

Note that while async traits have just been stabilized in rustc 1.75, a) they shouldn't be considered mature and b) conflict with the MSRV of 1.63 used by this crate.

Thank you for mentioning this @tnull , sorry I didn't see the conflict with the MSRV before. I don't see a clear solution to this issue except trying a build a wrapper around ureq and reqwest, which i think would look very patchy

Well, as mentioned above you should be able to use the #[async_trait] macro?

tnull avatar Jan 15 '24 15:01 tnull

Thanks @e1a0a0ea.

I'm a little unsure that this the place to address the problem. The problem exists when syncing and may be addressable in the syncing logic by handing the 429 error there and reducing the number of threads we're using to make requests. This has the advantage of not introducing more dependencies and complexity to the client. Also, in general I don't think we want to hide 429s from the developer -- if they are using more resources than is available then they should probably consider carefully what they are doing. Here I think we do want to reconsider our request parameters and respond to the 429.

I could be wrong here but I think we should at least try that first?

LLFourn avatar Jan 29 '24 06:01 LLFourn

Hmm just noting that @evanlinjin did try what I suggested anyway: https://github.com/bitcoindevkit/bdk/issues/1120#issuecomment-1834676960

LLFourn avatar Jan 29 '24 08:01 LLFourn

To avoid the dependency situation, I'm thinking we can introduce a "middleware" concept to the library. Somehow expose the http header information to the caller and the caller can do delays with the "middleware".

evanlinjin avatar Feb 04 '24 13:02 evanlinjin

Any update on this? It's unfortunate to have to patch bdk-esplora to avoid DOS'ing Esplora instances.

darosior avatar May 06 '24 10:05 darosior

Overall I like the approach, although I wonder if parsing a Retry-After header adds much value compared to just doubling a backoff. If we hit max_retries, we should give up anyway regardless of the date in the header, so I'm not convinced of the need to bring in instant (an apparently unmaintained crate) just to get the current time.

But I'm interested in testing this out on a more recent version of this crate, in other words can you rebase? and if not, are you willing to hand it off to someone who might have time to work on it?

ValuedMammal avatar May 30 '24 22:05 ValuedMammal

Overall I like the approach, although I wonder if parsing a Retry-After header adds much value compared to just doubling a backoff.

I still think from a principled approach that the Retry-After from the HTTP header is the way to go.

storopoli avatar May 31 '24 04:05 storopoli

bump @e1a0a0ea. It would be really nice to have this in the near future. Do you need extra help? Please let us know.

storopoli avatar Jun 18 '24 17:06 storopoli

Checkout this commen on a simple backoff algo that works for LWK: https://github.com/bitcoindevkit/bdk/issues/1120#issuecomment-2331794726

notmandatory avatar Sep 05 '24 15:09 notmandatory