ethers.js icon indicating copy to clipboard operation
ethers.js copied to clipboard

WebSocketProvider handle ws close and reconnect

Open PierreJeanjacquot opened this issue 5 years ago • 79 comments

Hi @ricmoo,

I'm using WebSocketProvider server-side to listen to blockchain events and performing calls to smart contracts. Sometimes the websocket pipe got broken and I need to reconnect it.

I use this code to detect ws close and reconnect but it would be nice to not have to rely on _websocket to do it:

let wsProvider;

init = async () => {
  wsProvider = new ethers.providers.WebSocketProvider(wsHost);
  wsProvider._websocket.on('close', async (code) => {
    console.log('ws closed', code);
    wsProvider._websocket.terminate();
    await sleep(3000); // wait before reconnect
    init();
  });
  wsProvider.on('block', doStuff);
};

I also noticed when the websocket is broken Promise call don't reject wich is not super intuitive.

PierreJeanjacquot avatar Sep 18 '20 14:09 PierreJeanjacquot

This is a very large feature... When I first (begrudgingly) added WebSocketProvider mentioned this would be something I would eventually get to, but that it won't be high priority any time soon. :)

But I want to! :)

It is still on the backlog, and I'll use this issue to track it, but there are other things I need to work on first.

Keep in mind when you reconnect, you may have been disconnected for a long time, in which case you should find and trigger events that were missed; you may have also been down fo a short period of time, in which case you must dedup events you've already emitted. Also, earlier events should be emitted before later ones. Unless there was a re-org, exactly-once semantics should be adhered to. All subscriptions will need some custom logic, depending on the type of subscription to handle this.

Also ethers providers guarantee consistent read-after-events. So, if a block number X has been emitted, a call to getBlock(X) must return a block. In many cases, due to the distributed nature of the Blockchain, especially with a FallbackProvider, one backend may have seen a block before others, so calling getBlock might occur on a node before it has actually seen the block, so the call must stall and (with exponential back-off) poll for the block and resolve it when it comes in. Similarly, this is true for events which include the transactionHash; a call to getTransaction must succeed, stalling until the data becomes available.

Also keep special note of block, debug, poll and network events which need themselves some coordination and may recall some changes in their super class to handle properly...

Basically, it's a feature I really want too, but I know it's going to take considerable time to complete and properly test. I just wanted to give some background on the complexity.

ricmoo avatar Oct 23 '20 00:10 ricmoo

I think this is probably the best solution:

const EXPECTED_PONG_BACK = 15000
const KEEP_ALIVE_CHECK_INTERVAL = 7500

export const startConnection = () => {
  provider = new ethers.providers.WebSocketProvider(config.ETH_NODE_WSS)

  let pingTimeout = null
  let keepAliveInterval = null

  provider._websocket.on('open', () => {
    keepAliveInterval = setInterval(() => {
      logger.debug('Checking if the connection is alive, sending a ping')

      provider._websocket.ping()

      // Use `WebSocket#terminate()`, which immediately destroys the connection,
      // instead of `WebSocket#close()`, which waits for the close timer.
      // Delay should be equal to the interval at which your server
      // sends out pings plus a conservative assumption of the latency.
      pingTimeout = setTimeout(() => {
        provider._websocket.terminate()
      }, EXPECTED_PONG_BACK)
    }, KEEP_ALIVE_CHECK_INTERVAL)

    // TODO: handle contract listeners setup + indexing
  })

  provider._websocket.on('close', () => {
    logger.error('The websocket connection was closed')
    clearInterval(keepAliveInterval)
    clearTimeout(pingTimeout)
    startConnection()
  })

  provider._websocket.on('pong', () => {
    logger.debug('Received pong, so connection is alive, clearing the timeout')
    clearInterval(pingTimeout)
  })
}

This send a ping every 15 seconds, when it sends a ping, it expects a pong back within 7.5 seconds otherwise it closes the connection and calls the main startConnection function to start everything over.

Where it says // TODO: handle contract listeners setup + indexing that's where you should do any indexing or listening for contract events etc.

Fine tune these timing vars to taste, depending on who your Node provider is, this are the settings I use for QuikNode

const EXPECTED_PONG_BACK = 15000
const KEEP_ALIVE_CHECK_INTERVAL = 7500

mikevercoelen avatar Mar 27 '21 13:03 mikevercoelen

To elaborate on @mikevercoelen's answer, I extracted the logic to a function

type KeepAliveParams = {
  provider: ethers.providers.WebSocketProvider;
  onDisconnect: (err: any) => void;
  expectedPongBack?: number;
  checkInterval?: number;
};

const keepAlive = ({
  provider,
  onDisconnect,
  expectedPongBack = 15000,
  checkInterval = 7500,
}: KeepAliveParams) => {
  let pingTimeout: NodeJS.Timeout | null = null;
  let keepAliveInterval: NodeJS.Timeout | null = null;

  provider._websocket.on('open', () => {
    keepAliveInterval = setInterval(() => {
      provider._websocket.ping();

      // Use `WebSocket#terminate()`, which immediately destroys the connection,
      // instead of `WebSocket#close()`, which waits for the close timer.
      // Delay should be equal to the interval at which your server
      // sends out pings plus a conservative assumption of the latency.
      pingTimeout = setTimeout(() => {
        provider._websocket.terminate();
      }, expectedPongBack);
    }, checkInterval);
  });

  // eslint-disable-next-line @typescript-eslint/no-explicit-any
  provider._websocket.on('close', (err: any) => {
    if (keepAliveInterval) clearInterval(keepAliveInterval);
    if (pingTimeout) clearTimeout(pingTimeout);
    onDisconnect(err);
  });

  provider._websocket.on('pong', () => {
    if (pingTimeout) clearInterval(pingTimeout);
  });
};

Then in my code, i get:

const startBot = () => {
  const provider = new ethers.providers.WebSocketProvider(wsUrl);
  keepAlive({
      provider,
      onDisconnect: (err) => {
        startBot();
        console.error('The ws connection was closed', JSON.stringify(err, null, 2));
      },
    });
};

gwendall avatar May 17 '21 14:05 gwendall

We're two months in and the code mentioned before, has been running steadily on our node :) 0 downtime.

mikevercoelen avatar May 17 '21 14:05 mikevercoelen

Really cool ! Thanks again for sharing :)

gwendall avatar May 17 '21 14:05 gwendall

@mikevercoelen I'm using ethers 5.0.32 and the websocket provider doesn't have the 'on' method which really hampers implementing your solution ;). What version of ethers are you using?

sentilesdal avatar May 19 '21 16:05 sentilesdal

There should definitely be an .on method. There is no version of WebSocketProvider that didn’t have it, since it inherits from JsonRpcProvider.

ricmoo avatar May 19 '21 16:05 ricmoo

Ok well I'm not sure what's going on. Its definitely not there, I'm seeing an interface for provider._websocket that looks just like a regular websocket interface: https://developer.mozilla.org/en-US/docs/Web/API/WebSocket/onopen.

Is there typo in the code above? Perhaps instead of provider._websocket.on('open', () => {}) I should be calling these directly on the provider? I tried this too but the provider doesn't recognize the 'open', 'close', and 'pong' event types. websocket-provider.ts from ethers only shows these event types: 'block', 'pending', 'filter', 'tx', 'debug', 'poll', 'willPoll', 'didPoll', 'error'.

sentilesdal avatar May 20 '21 20:05 sentilesdal

Oh! Sorry, yes. In general you should use provider.on. The _websocket is a semi-private member and should not generally be touched , unless direct access too it is needed. But only ethers-supported events are supported by provider.on.

It depends on your environment what your ._webprovider is. Some platforms may use .addEventListener instead of .on, maybe?

If your goal is to enable automatic reconnect, this is not something that is simple to do in a safe way, so make sure you test it thoroughly. :)

ricmoo avatar May 20 '21 21:05 ricmoo

We are actually using alchemy so was able to just use their web3 websocket provider and plugged it into our ethers ecosystem with ethers.provider.Web3Provider. they handle all the reconnects and even dropped calls very gracefully.

sentilesdal avatar May 20 '21 21:05 sentilesdal

One question @ricmoo , @gwendall when trying to use the code snippet above I get that the websocket object doesn't have on method.

I am using the latest ethers 5.3 from the dapp

rrecuero avatar Jun 09 '21 16:06 rrecuero

@rrecuero I ran into the same problem and I'm still not sure how that code above works :P

sentilesdal avatar Jun 09 '21 21:06 sentilesdal

The solution of the @mikevercoelen didn't worked on me maybe because I'm using the browser version of WebSocket so for me the workaround was writing a custom class that reconnect's everytime the connection closes.

const ethers = require("ethers");

class ReconnectableEthers {
    /** 
     * Constructs the class
    */
    constructor() {
        this.provider = undefined;
        this.wallet   = undefined;
        this.account  = undefined;
        this.config   = undefined;
        
        this.KEEP_ALIVE_CHECK_INTERVAL = 1000;

        this.keepAliveInterval = undefined;
        this.pingTimeout       = undefined;
    }

    /**
     * Load assets.
     * @param {Object} config Config object.
     */
    load(config) {
        this.config = config;
        this.provider = new ethers.providers.WebSocketProvider(this.config["BSC_PROVIDER_ADDRESS"])
        this.wallet   = new ethers.Wallet(this.config["PRIVATE_KEY"]);
        this.account  = this.wallet.connect(this.provider);
        
        this.defWsOpen    = this.provider._websocket.onopen;
        this.defWsClose   = this.provider._websocket.onclose;

        this.provider._websocket.onopen    = (event) => this.onWsOpen(event);
        this.provider._websocket.onclose   = (event) => this.onWsClose(event);
    }

    /**
     * Check class is loaded.
     * @returns Bool
     */
    isLoaded() {
        if (!this.provider) return false;
        return true;
    }

    /**
     * Triggered when provider's websocket is open.
     */
    onWsOpen(event) {
        console.log("Connected to the WS!");
        this.keepAliveInterval = setInterval(() => { 
            if (
                this.provider._websocket.readyState === WebSocket.OPEN ||
                this.provider._websocket.readyState === WebSocket.OPENING
            ) return;

            this.provider._websocket.close();
        }, this.KEEP_ALIVE_CHECK_INTERVAL)

        if (this.defWsOpen) this.defWsOpen(event);
    }

    /**
     * Triggered on websocket termination.
     * Tries to reconnect again.
     */
    onWsClose(event) {
        console.log("WS connection lost! Reconnecting...");
        clearInterval(this.keepAliveInterval)
        this.load(this.config);

        if (this.defWsClose) this.defWsClose(event);
    }
}

module.exports = ReconnectableEthers;

tarik0 avatar Jun 14 '21 23:06 tarik0

@tarik0 i'm running WebSocket on browser too, that's exactly what I need, thank you

mhughdo avatar Jul 02 '21 11:07 mhughdo

I implemented @mikevercoelen's method and it works perfectly. Thank you!

Question, though: Where is the ping() function in provider._websocket.ping() defined? I'm not seeing it in the ethers source and confused as to where/how it's doing the sending.

EvilJordan avatar Sep 29 '21 23:09 EvilJordan

@Dylan-Kerler thanks, it is definitely cleaner. I agree with your point for most use cases. But if you need certain consistency guarantees, have a read of @ricmoo's concerns on https://github.com/ethers-io/ethers.js/issues/1053#issuecomment-714834698. Which the web3 client won't handle.

pulasthibandara avatar Oct 17 '21 22:10 pulasthibandara

Hello all. Is there plans to implement his on a near future? Our project is using ethers and at first we considered building this function on our code, but if there is any plans for implementing this on ethers, it would save us some time.

Thanks!

phbrgnomo avatar Oct 26 '21 18:10 phbrgnomo

This is harder to implement then it looks on the surface if you want to include:

  • re-establish subscriptions after reconnecting
  • queue and re-send queued requests after the connection is re-established (like it's done before the connection is established for the first time)
  • differentiate between the different reasons for a disconnect (allow client code to define which action to take based on disconnect status code, e.g. if the server is going down, reconnecting is pointless... this must be configurable because it depends on the server implementation)
  • retry & exponential back-off logic
  • keep-alive

None of the solutions (including the web3-provider, which also as a lot of other downsides) posted in here cover this properly so far.

There are resilient, generic websocket implementations out there already (not json-rpc specific) that this could be built around. The only thing that would have to be custom-built in ethers would be tracking of active subscriptions to re-establish those on reconnect.

There's https://github.com/pladaria/reconnecting-websocket which could use some cleanup and modernization but is otherwise rather robust. Could make sense to fork / integrate that into ethers.js with the json-rpc specifics simply built around it .

fubhy avatar Nov 04 '21 08:11 fubhy

How can this be the accepted solution? It's like 50 lines of custom code that people have to copy and paste for what should be a boilerplate problem.

@sentilesdal solution seems much better - wrapping the input provider in web3Ws provider:

import Web3WsProvider from 'web3-providers-ws'

this.provider = new ethers.providers.Web3Provider(
  new Web3WsProvider(process.env.PROVIDER_URL, {
    clientConfig: {
        keepalive: true,
        keepaliveInterval: 60000, // ms
     }
     // Enable auto reconnection
     reconnect: {
        auto: true,
        delay: 5000, // ms
        maxAttempts: 5,
        onTimeout: false
     }
  }),
)

I don't see why this is so hard to add to ethersjs though. If web3Ws provider can do it, why can't ethersjs do it?

Thanks a lot, def. the answer i was looking for ! works like a charm.

cryptofomo1 avatar Nov 28 '21 17:11 cryptofomo1

How can this be the accepted solution? It's like 50 lines of custom code that people have to copy and paste for what should be a boilerplate problem. @sentilesdal solution seems much better - wrapping the input provider in web3Ws provider:

import Web3WsProvider from 'web3-providers-ws'

this.provider = new ethers.providers.Web3Provider(
  new Web3WsProvider(process.env.PROVIDER_URL, {
    clientConfig: {
        keepalive: true,
        keepaliveInterval: 60000, // ms
     }
     // Enable auto reconnection
     reconnect: {
        auto: true,
        delay: 5000, // ms
        maxAttempts: 5,
        onTimeout: false
     }
  }),
)

I don't see why this is so hard to add to ethersjs though. If web3Ws provider can do it, why can't ethersjs do it?

Thanks a lot, def. the answer i was looking for ! works like a charm.

So I've tried this solution to mitigate connection hang using websocket provider. It indeed have worked out, BUT after switching to this provider implementation, my request count has increased 10x

Im listening on 'block' method, and then calling eth_getBlockByNumber using other HTTP provider.

Over 24 hours using InfuraWebsocket provider that may hang, my daily request count was 10k. What a surprise I had when my daily limit of 100k has been reached, after using this provider implementation.

Havent dig why that happened yet

Vladkryvoruchko avatar Dec 12 '21 23:12 Vladkryvoruchko

How can this be the accepted solution? It's like 50 lines of custom code that people have to copy and paste for what should be a boilerplate problem.

@sentilesdal solution seems much better - wrapping the input provider in web3Ws provider:

import Web3WsProvider from 'web3-providers-ws'

this.provider = new ethers.providers.Web3Provider(
  new Web3WsProvider(process.env.PROVIDER_URL, {
    clientConfig: {
        keepalive: true,
        keepaliveInterval: 60000, // ms
     }
     // Enable auto reconnection
     reconnect: {
        auto: true,
        delay: 5000, // ms
        maxAttempts: 5,
        onTimeout: false
     }
  }),
)

I don't see why this is so hard to add to ethersjs though. If web3Ws provider can do it, why can't ethersjs do it?

Hey, did you manage to get this to work in node.js? With integrating your code I get the following message: Web3WsProvider is not a constructor

Maybe this is because you used or code in React with TypeScript or something like that?

bennyvenassi avatar Jan 11 '22 18:01 bennyvenassi

I made a workaround like this: created a custom WebSocketProvider that uses ReconnectingWebSocket('url') instead of ordinary Websocket('url') that you use in ethers' WebSocketProvider. For me, it works fine, without any issue for now. It could be nice if we could initialize ethers' WebSocketProvider with a custom WebSocket object rather than just a WebSocket URL. What are your thoughts?

mightymatth avatar Jan 20 '22 16:01 mightymatth

How can this be the accepted solution? It's like 50 lines of custom code that people have to copy and paste for what should be a boilerplate problem.

@sentilesdal solution seems much better - wrapping the input provider in web3Ws provider:

import Web3WsProvider from 'web3-providers-ws'

this.provider = new ethers.providers.Web3Provider(
  new Web3WsProvider(process.env.PROVIDER_URL, {
    clientConfig: {
        keepalive: true,
        keepaliveInterval: 60000, // ms
     }
     // Enable auto reconnection
     reconnect: {
        auto: true,
        delay: 5000, // ms
        maxAttempts: 5,
        onTimeout: false
     }
  }),
)

I don't see why this is so hard to add to ethersjs though. If web3Ws provider can do it, why can't ethersjs do it?

This looks ideal but I can't it to work

import { WebsocketProvider } from 'web3-providers-ws'
import { Web3Provider } from '@ethersproject/providers'

const provider = new Web3Provider(new WebsocketProvider('ws://whatever', {
    timeout: 30000,
    clientConfig: {
        keepalive: true,
        keepaliveInterval: 60000,
    },
    reconnect: {
        auto: true,
        delay: 5000,
    }
}))

Results in this error....

error TS2345: Argument of type 'WebsocketProvider' is not assignable to parameter of type 'ExternalProvider | JsonRpcFetchFunc'.
  Type 'WebsocketProviderBase' is not assignable to type 'ExternalProvider'.
    Types of property 'send' are incompatible.
      Type '(payload: JsonRpcPayload, callback: (error: Error | null, result?: JsonRpcResponse | undefined) => void) => void' is not assignable to type '(request: { method: string; params?: any[] | undefined; }, callback: (error: any, response: any) => void) => void'.
        Types of parameters 'payload' and 'request' are incompatible.
          Property 'jsonrpc' is missing in type '{ method: string; params?: any[] | undefined; }' but required in type 'JsonRpcPayload'.

kebian avatar Jan 30 '22 22:01 kebian

@kebian same problem here using Typescript

johanneskares avatar Mar 07 '22 12:03 johanneskares

@johanneskares @kebian

In TypeScript:

import { ethers } from 'ethers'
import Web3WsProvider from 'web3-providers-ws'

const provider = new ethers.providers.Web3Provider(
  new (Web3WsProvider as any)('ws://whatever', {
    clientConfig: {
      keepalive: true,
      keepaliveInterval: 60000,
    },
    reconnect: {
      auto: true,
      delay: 1000,
      maxAttempts: 5,
      onTimeout: false
    }
  }),
)

Annoyingly hacky, but it works.

jophish avatar Mar 15 '22 16:03 jophish

I am surprised that works.

A Web3Provider is designed to wrap an object with the interface of ExternalProvider. Did Web3 change their object shape for their provider object?

ricmoo avatar Mar 15 '22 16:03 ricmoo

@jophish

Annoyingly hacky, but it works.

How would it look in JS ?

Thanks in advance!

freaker2k7 avatar Mar 15 '22 17:03 freaker2k7

Just leaving my hacky solution here as well in case someone else (like me) really prefers for the functionality to be encapsulated in a class. Essentially I extracted @mikevercoelen's method into a class that wraps a WebSocketProvider using an ES6 Proxy. Proxies are a bit like magic, so there's no guarantee that this works in all cases, but so far it's worked for me and my use-cases, so I figured I'd share.

import { providers } from 'ethers';

const EXPECTED_PONG_BACK = 15000;
const KEEP_ALIVE_CHECK_INTERVAL = 7500;

// Used to "trick" TypeScript into treating a Proxy as the intended proxied class
export const fakeBaseClass = <T>() : new() => Pick<T, keyof T> => (class {} as any);

export class ReconnectingWebSocketProvider extends fakeBaseClass<providers.WebSocketProvider>() {
  private underlyingProvider: providers.WebSocketProvider;

  // Define a handler that forwards all "get" access to the underlying provider
  private handler = {
    get(target: ReconnectingWebSocketProvider, prop: string, receiver: any) {
      return Reflect.get(target.underlyingProvider, prop, receiver);
    },
  };

  constructor(private url: string) {
    super();
    this.connect();
    return new Proxy(this, this.handler);
  }

  private connect() {
    // Copy old events
    const events = this.underlyingProvider?._events ?? [];

    // Instantiate new provider with same url
    this.underlyingProvider = new providers.WebSocketProvider(this.url);

    let pingTimeout: NodeJS.Timeout;
    let keepAliveInterval: NodeJS.Timer;

    this.underlyingProvider._websocket.on('open', () => {
      // Send ping messages on an interval
      keepAliveInterval = setInterval(() => {
        this.underlyingProvider._websocket.ping();

        // Terminate if a pong message is not received timely
        pingTimeout = setTimeout(() => this.underlyingProvider._websocket.terminate(), EXPECTED_PONG_BACK);
      }, KEEP_ALIVE_CHECK_INTERVAL);

      // Add old events to new provider
      events.forEach((event) => {
        this.underlyingProvider._events.push(event);
        this.underlyingProvider._startEvent(event);
      });
    });

    // Clear timers and reconnect on close
    this.underlyingProvider._websocket.on('close', () => {
      clearInterval(keepAliveInterval);
      clearTimeout(pingTimeout);
      this.connect();
    });

    // Clear ping timer when pong is received
    this.underlyingProvider._websocket.on('pong', () => {
      clearInterval(pingTimeout);
    });
  }
}

rkalis avatar Mar 28 '22 16:03 rkalis

I think this is probably the best solution:

const EXPECTED_PONG_BACK = 15000
const KEEP_ALIVE_CHECK_INTERVAL = 7500

export const startConnection = () => {
  provider = new ethers.providers.WebSocketProvider(config.ETH_NODE_WSS)

  let pingTimeout = null
  let keepAliveInterval = null

  provider._websocket.on('open', () => {
    keepAliveInterval = setInterval(() => {
      logger.debug('Checking if the connection is alive, sending a ping')

      provider._websocket.ping()

      // Use `WebSocket#terminate()`, which immediately destroys the connection,
      // instead of `WebSocket#close()`, which waits for the close timer.
      // Delay should be equal to the interval at which your server
      // sends out pings plus a conservative assumption of the latency.
      pingTimeout = setTimeout(() => {
        provider._websocket.terminate()
      }, EXPECTED_PONG_BACK)
    }, KEEP_ALIVE_CHECK_INTERVAL)

    // TODO: handle contract listeners setup + indexing
  })

  provider._websocket.on('close', () => {
    logger.error('The websocket connection was closed')
    clearInterval(keepAliveInterval)
    clearTimeout(pingTimeout)
    startConnection()
  })

  provider._websocket.on('pong', () => {
    logger.debug('Received pong, so connection is alive, clearing the timeout')
    clearInterval(pingTimeout)
  })
}

This send a ping every 15 seconds, when it sends a ping, it expects a pong back within 7.5 seconds otherwise it closes the connection and calls the main startConnection function to start everything over.

Where it says // TODO: handle contract listeners setup + indexing that's where you should do any indexing or listening for contract events etc.

Fine tune these timing vars to taste, depending on who your Node provider is, this are the settings I use for QuikNode

const EXPECTED_PONG_BACK = 15000
const KEEP_ALIVE_CHECK_INTERVAL = 7500

provider._websocket.ping() doesn't seem to be called, "pong" event is never received. I'm using Alchemy node, any ideas?

gluonfield avatar Apr 01 '22 10:04 gluonfield

Why not just use https://github.com/dphilipson/sturdy-websocket ?

sambacha avatar Apr 10 '22 09:04 sambacha