near-api-js icon indicating copy to clipboard operation
near-api-js copied to clipboard

RPC Redundancy/Failover Configuration

Open TrevorJTClarke opened this issue 2 years ago • 13 comments

Is your feature request related to a problem? Please describe. Current RPC providers can have downtime, temporary connectivity issues, or rate limits that make clients transactions fail. Over the past year we have observed several windows of RPC failure, which could have been mitigated if near-api-js had configurations for multiple RPC providers.

Describe the solution you'd like Similar to issues #703 and #717, the core JSON RPC provider should be refactored to not only have retries, but allow for retries against multiple Providers. The default provider list can configure both near foundation and openshards RPC services. Provider list can be a single node string for backward compatibility OR an array of node strings. Up for discussion but needed: Create a failover threshold of retries for each provider, and a threshold for provider failures before defaulting to a different priority.

Configuration Example:

RPC_MAINNET_PROVIDERS="https://rpc.mainnet.near.org,[https://mainnet-rpc.openshards.io](https://mainnet-rpc.openshards.io)"

RPC_GUILDNET_PROVIDERS="https://guildnet-rpc.openshards.io,[https://rpc.guildnet.near.org](https://rpc.guildnet.near.org)"

NOTE: Because near-api-js is used in many dapps and repos, this functionality is very key toward providing the easiest way to allow clients to decentralize their RPC access. This is critical for community attacks against public resources.

Describe alternatives you've considered Users must create multiple instances of the Near module with different providers configured and detect TXN failures. Not idea at all.

Additional context There is an ongoing effort to create a decentralized RPC for mainnet & guildnet using many of the openshards.io nodes with a redundant load-balancer.

TrevorJTClarke avatar Nov 22 '21 23:11 TrevorJTClarke

I believe our strategy for the decentralization of RPC Servers is a bit different, but some of these ideas can be implemented. On near-api-js level we can provide support for multiple servers and fallback logic, but it will add an additional level of complexity and source of petensioal bugs. @frol @MaximusHaximus , any thoughts?

volovyks avatar Nov 24 '21 11:11 volovyks

Similar suggestion from @artob: https://github.com/near/near-api-js/issues/735

volovyks avatar Nov 24 '21 21:11 volovyks

@volovyk-s What is on your mind in terms of a different strategy? I feel near-api-js is the right abstraction layer to deal with the pool of RPC servers to enable true decentralization.

frol avatar Nov 25 '21 17:11 frol

@frol there are two separate problems. The first one is a stability of a single RPC Server. The second one is the ability to switch to another server when the first one is down (decentralization). As far as I know, our current strategy was to work on stability first (API Keys). For the second one, I agree, usage of multiple RPC Servers with fallbacks on a clientside is the best option. But we will need to design it carefully, simple fallback on each call can be slow. And we will need to support API Keys for each such server. I will prioritize this issue.

volovyks avatar Nov 29 '21 16:11 volovyks

@volovyk-s Ah, well, those are indeed two completely different efforts, but they came somewhat together, and we need both solutions: (1) extended RPC connection configuration, (2) failover configuration. This issue is about the second point.

frol avatar Nov 29 '21 18:11 frol

@volovyk-s Ah, well, those are indeed two completely different efforts, but they came somewhat together, and we need both solutions: (1) extended RPC connection configuration, (2) failover configuration. This issue is about the second point.

(and @volovyk-s)

I apologize on the side discussion on decentralization here... It was just to mention the addition context and reasons.

The goal of this issue is to add support for multiple RPC configurations, allowing retries against a prioritized list of RPC nodes. This at least mitigates single RPC provider failures/outages. The decentralization should be handled by a very different setup than SDK. :)

TrevorJTClarke avatar Nov 29 '21 20:11 TrevorJTClarke

@frol @volovyk-s any movement on this? Another downtime/major latency issue on mainnet, with many apps unusable because of the dependency of a single RPC provider.

TrevorJTClarke avatar Dec 15 '21 05:12 TrevorJTClarke

Seems like the ideal solution here will be the creation of FailoverJsonRpcProvider, which will make several simultaneous calls to all provided RPC URLs and return the result if, let's say, 50+% returns the same value. Or the first successful result if we want it to be snappy.

The problem here is the increased load on RPC Servers, something that we are trying to avoid. Also, the code of near-api-js is heavily coupled and relies on JsonRpcProvider instead of Provider interface. Usage of the new Provider will lead to a ton of breaking changes. @MaximusHaximus I think we should move the provider to a separate library in the future. People should be able to create their own implementations and use them in near-api-js-x.

In our case, we will need to refactor the existing JsonRpcProvider. And probably these calls and checks will be sequential. It will increase response time when the main RPC Server is down.

Also, we can refactor utils/web.ts to achieve the same result (with similar downsides).

volovyks avatar Dec 21 '21 15:12 volovyks

Enable configuring multiple RPC nodes also helps to resolve the feature request of switching RPC URLs in NEAR wallet: https://github.com/near/near-wallet-roadmap/issues/36, if wallet could set fallback RPC URLs by default.

think-in-universe avatar Jan 25 '22 08:01 think-in-universe

I tried to add a new property node_urls for Near.ts. Tried polling different Connections and found that none of the Connection functions could return the status of the node. Invoke status() function still request retry 12 times rather than return a wrong status.

Trying to only modify the implementation of Near.ts is wrong and modifying json-rpc-provider.ts and utils/web.ts are appropriate.

SteveYuOWO avatar Jan 25 '22 14:01 SteveYuOWO

When executing fetchJson, try to poll through the list of all rpc's. Find a connectable rpc and default it to be reliable. This avoids exponentialBackoff to nodes that can never connect.

SteveYuOWO avatar Jan 25 '22 15:01 SteveYuOWO

@frol @volovyk-s Yet another downtime/major latency issues on primarily on testnet, because of the dependency of a single RPC provider.

TrevorJTClarke avatar Feb 22 '22 22:02 TrevorJTClarke

Dear Team,

We have a library named fallback-falooda which helps us get the reliable node in the list of nodes in our cosmos environment. We are also using it in our node selector in our near specific use cases.

I have made a few changes in the code to make it work with fallback falooda. here -

https://github.com/near/near-api-js/compare/master...leapsamvel:near-api-js:master

We can, either

  1. Take the approach of using fallback falooda in the library or
  2. Provide a getter method param and allow the user to provide the URL dynamic when the RPC call is made.

Could you let me know, and I will raise the PR accordingly with the test cases and documentation?

TIA.

leapsamvel avatar Feb 20 '23 04:02 leapsamvel

Resolved by #1334

vikinatora avatar Apr 10 '24 12:04 vikinatora