ethers.js icon indicating copy to clipboard operation
ethers.js copied to clipboard

provider stops/hangs for no reason

Open rollue opened this issue 2 years ago • 8 comments

Ethers Version

5.6.1

Search Terms

hangs,provider,block

Describe the Problem

I use the following code with AlchemyWebsocket provider.

provider.on('block', async (blockNumber: number) => {
      const logs = await provider.getLogs(someFilter);
      // process logic
}

I run this in ECR container on AWS. Ideally, this should run continuously to listen to every block. The problem is that after some random hours it hangs and stops listening.

It doesn't even log any errors. And it doesn't seem like something to do with container memories etc. What's interesting is that the health check endpoint that is exposed returns 200 OK responses, even after the provider stops listening - which means the docker container is still running.

It normally happens after 5-6 hours but in general, the timing is more or less random.

Code Snippet

No response

Contract ABI

No response

Errors

No response

Environment

node.js (older than v12)

Environment (Other)

docker, docker compose, AWS ECR

rollue avatar Mar 29 '22 00:03 rollue

Update: I've contracted alchemy support and apparently they've rolled out a reliability improvement on websockets 30 minutes ago. Hopefully this fixes the issue.

rollue avatar Mar 30 '22 23:03 rollue

@ricmoo we're running into this issue as well when using alchemy. the proximate cause is that their server fails for some reason, which i'm investigating with them.

but the underlying problem is that there is something in the getLogs call to a websocket provider that causes the process to hang and increase memory consumption constantly.

varunsrin avatar Jun 07 '22 16:06 varunsrin

If you can figure out a root cause, I’d love to improve the WS support. :)

ricmoo avatar Jun 07 '22 22:06 ricmoo

We've managed to resolve our issue by moving all our requests to HTTP connections. Unfortunately, debugging the issue is not straightforward since it depends on a specific provider configuration and is non-deterministic. Sharing what we know so far in case it helps:

  1. this issue happens when making a request over a websocket (e.g. getLogs)
  2. this issue requires something specific to change on the server
  3. this seems to only occur after the connection has been held for several hours (usually 4+)

I've also sent this along to the Alchemy team in case they have any insights into what the server issues might be.

varunsrin avatar Jun 08 '22 18:06 varunsrin

Is it possible to implement some kind of reconnect logic upon timeout if the socket hangs?

nabioz avatar Jul 07 '22 13:07 nabioz

Was there a solution for this? Getting the same issues when running a listener on ECS, then it stops listening after random number of hours. When this happens, the health checks also pass.

kellemar avatar Jan 03 '24 09:01 kellemar

Here's a repro:

  • have ganache running in :8547
$ npm i ethers@5
$ node
Welcome to Node.js v18.19.0.
Type ".help" for more information.
> e=require("ethers"); p=new e.providers.JsonRpcProvider("http://localhost:8547"); p.getNetwork().then(console.log)
Promise {
  <pending>,
  [Symbol(async_id_symbol)]: 49,
  [Symbol(trigger_async_id_symbol)]: 45
}
>
$ uname -a
Linux 5.15.0-91-generic #101-Ubuntu SMP Tue Nov 14 13:30:08 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
  • notice the promise never resolves, getNetwork gets stuck

jtakalai avatar Jan 25 '24 15:01 jtakalai