solana-web3.js icon indicating copy to clipboard operation
solana-web3.js copied to clipboard

_handleServerNotification on Connection.ts has an unhandled error.

Open jpablogn opened this issue 1 year ago • 4 comments

My app uses connect.onAccountChange() and connect.removeAccountChangeListener() and is keeps open 100-200 subscritions.

Main libraries:

  • @solana/web3.js ^1.91.7
  • rpc-websockets ^7.5.1

The app crashes when it tries to use subscribe/unsubscribe using onAccountChange()/removeAccountChangeListener() and the Websocket is not with readyState===1.

Fail when trying to subscribe (it goes in an infinite loop with all the accounts handled at that moment):

accountSubscribe error for argument [
  'F3TuARNBKV6U7uA4DvLZUUgmcAAEtpDeMyC36Zb1CPnR',
  { encoding: 'base64', commitment: 'confirmed' }
] Tried to call a JSON-RPC method `accountSubscribe` but the socket was not `CONNECTING` or `OPEN` (`readyState` was 3)

Fail when trying to unsubscribe (it goes in an infinite loop with all the accounts handled at that moment):

accountUnsubscribe error: Tried to call a JSON-RPC method `accountUnsubscribe` but the socket was not `CONNECTING` or `OPEN` (`readyState` was 2)

And finally crashes:

<--- Last few GCs --->

[1996:0000021E09A48EE0] 19434569 ms: Mark-Compact 4047.4 (4134.3) -> 4032.6 (4135.3) MB, 1414.30 / 0.00 ms  (average mu = 0.253, current mu = 0.245) allocation failure; scavenge might not succeed
[1996:0000021E09A48EE0] 19436455 ms: Mark-Compact 4048.4 (4135.3) -> 4033.8 (4136.6) MB, 1735.41 / 0.00 ms  (average mu = 0.169, current mu = 0.080) allocation failure; scavenge might not succeed


<--- JS stacktrace --->

FATAL ERROR: Ineffective mark-compacts near heap limit Allocation failed - JavaScript heap out of memory
----- Native stack trace -----

 1: 00007FF6430BC81B node::SetCppgcReference+17979
 2: 00007FF643026674 DSA_meth_get_flags+89316
 3: 00007FF643AA4871 v8::Isolate::ReportExternalAllocationLimitReached+65
 4: 00007FF643A8DFC8 v8::Function::Experimental_IsNopFunction+1336
 5: 00007FF6438EFA70 v8::Platform::SystemClockTimeMillis+659328
 6: 00007FF6438FBCF3 v8::Platform::SystemClockTimeMillis+709123
 7: 00007FF6438F9654 v8::Platform::SystemClockTimeMillis+699236
 8: 00007FF6438EC790 v8::Platform::SystemClockTimeMillis+646304
 9: 00007FF643901E0A v8::Platform::SystemClockTimeMillis+733978
10: 00007FF643902687 v8::Platform::SystemClockTimeMillis+736151
11: 00007FF643910F7F v8::Platform::SystemClockTimeMillis+795791
12: 00007FF6435D1CA5 v8::CodeEvent::GetFunctionName+116773

As far as WebSocket.readyState is not visible, I have tried to catch the error in different ways but I can't catch it.

Option 1: try-catch

const removeListener = async (id:number) =>{

    let ok:boolean = false;
    let n_times:number = 1;

    do{

        try{
            await connection.removeAccountChangeListener(id);
            ok = true;
        }catch(error){
            ok = false;
        }            

        if (!ok){
            await delay(1000); // custom function to delay 1s
            console.log(`Try: ${n_times}, Nop: ${id}`);
            n_times++;
        }else{
            console.log(`Try: ${n_times}, Disconnected: ${id}`);
        }

    }while(!ok);

}

Option 2: catch chained

const removeListener = async (id:number) =>{

    let ok:boolean = false;
    let n_times:number = 1;

    do{
        ok = true;
        await connection.removeAccountChangeListener(id).catch(error){
            ok = false;
        }            

        if (!ok){
            await delay(1000); // custom function to delay 1s
            console.log(`Try: ${n_times}, Nop: ${id}`);
            n_times++;
        }else{
            console.log(`Try: ${n_times}, Disconnected: ${id}`);
        }

    }while(!ok);

}

These errors are in file "rpc-websocket.ts" call and notify reject funcions:

      new Error(
        'Tried to call a JSON-RPC method `' +
          args[0] +
          '` but the socket was not `CONNECTING` or `OPEN` (`readyState` was ' +
          readyState +
          ')',
      ),

But I think the problem could came from 'connection.ts' which has this piece of code with a no controlled catch, because I can't catch the error:

  /**
   * @internal
   */
  private _handleServerNotification<
    TCallback extends SubscriptionConfig['callback'],
  >(
    serverSubscriptionId: ServerSubscriptionId,
    callbackArgs: Parameters<TCallback>,
  ): void {
    const callbacks =
      this._subscriptionCallbacksByServerSubscriptionId[serverSubscriptionId];
    if (callbacks === undefined) {
      return;
    }
    callbacks.forEach(cb => {
      try {
        cb(
          // I failed to find a way to convince TypeScript that `cb` is of type
          // `TCallback` which is certainly compatible with `Parameters<TCallback>`.
          // See https://github.com/microsoft/TypeScript/issues/47615
          // @ts-ignore
          ...callbackArgs,
        );
      } catch (e) {
        console.error(e);
      }
    });
  }

Few questions open en Solana Stack Exchange: https://solana.stackexchange.com/questions/15643/accountunsubscribe-and-accountsubscribe-error-tried-to-call-a-json-rpc-method-w https://solana.stackexchange.com/questions/15698/solana-web3-js-it-is-not-possible-to-detect-websocket-error

Could it be a bug? How can I handle or catch this error? How can I check websocket.readyStatus to avoid subscribe/unsubscribe when it is not in OPEN or CONNECTING?

jpablogn avatar Aug 02 '24 22:08 jpablogn

Subscribe errors are caught here: https://github.com/solana-labs/solana-web3.js/blob/maintenance/v1.x/src/connection.ts#L6255-L6270

Unsubscribe errors are caught here: https://github.com/solana-labs/solana-web3.js/blob/maintenance/v1.x/src/connection.ts#L6310-L6322

In neither place are these errors surfaced to your application.

It's possible that a check for readyState could be added to isCurrentConnectionStillActive(). Could you try that, and send a PR if it works?

All things considered, this subscriptions API and implementation are terminally flawed. We've completely rewritten it in 2.0, and I encourage you to migrate your app now. https://x.com/anza_xyz/status/1819093864394764674

steveluscher avatar Aug 07 '24 06:08 steveluscher

Yes, in fact that is the problem, but I don't know how to solve it.

If the program arrives to the line where is written "// TODO: Maybe add an 'errored' state or a retry limit?" (lines: 6.316 or 6.265) it goes to an infinite loop that crash the javascript stack.

I have modified rpc-websocket.ts to have visibility of readyState like this:

readyState(){
    // websocket CONNECTING (0) or OPEN (1)
    return ((this.underlyingSocket?.readyState === 1) || (this.underlyingSocket?.readyState === 0));
}

I have tried three different things without success: Case 1: Include this piece of code in line 6.193:

if(!this._rpcWebSocket.readyState()){
      this._rpcWebSocket.connect();
      return;
}

Case 2. Add it to isCurrentConnectionStillActive() like this:

const activeWebSocketGeneration = this._rpcWebSocketGeneration;
    const isCurrentConnectionStillActive = () => {
      return ((activeWebSocketGeneration === this._rpcWebSocketGeneration) && (this._rpcWebSocket.readyState()));
 };

Case 3. Add this piece of code in lines 6.316 and 6.265:

if (!this._rpcWebSocket.readyState()){
   return;
}

Running the test npm run test:test-with-live-validator gives following results:

  1. Source code (without changes):
  490 passing (1m)
  3 pending
  7 failing
  1. Case 1:
  310 passing (1m)
  3 pending
  62 failing
  1. Case 2:
  476 passing (1m)
  3 pending
  21 failing
  1. Case 3:
  474 passing (1m)
  3 pending
  23 failing

Running npm run test:unit:node gives following results:

  1. Source code (without changes):
  471 passing (1s)
  1. Case 1:
  Tests not finished
  1. Case 2:
 Test not finished
  1. Case 3:
 Test not finished

Adding a return just in line 6311 passes all test successfully but I don't know the real effect (as far as the original code also passes al of them but fails in produccion when doing a websocket call when readyState != 1):

if (e instanceof Error) {
   console.error(`${unsubscribeMethod} error:`, e.message);
   return;
}

I will test it over a real server because I don't have the knowledge to do the test.

By the way I think lines 6.297-6.299 are repeated and can be deleted: https://github.com/solana-labs/solana-web3.js/blob/maintenance/v1.x/src/connection.ts#L6297-L6304

Using a totaly refactored library means investing lots of hours in learning how it works (take into account that there are not many examples published to use as reference), for a big project may have sense, for an small one may be it is faster to do your own small library instead of trying to learn how such a complex one works (talking from my ignorant point of view).

Thank you.

jpablogn avatar Aug 07 '24 18:08 jpablogn

This in not working fine in my code, it just gives back undefined.... :)

readyState(){
    // websocket CONNECTING (0) or OPEN (1)
    return this.underlyingSocket?.readyState;
}

But even in that case, this code:

} catch (e) {
      if (e instanceof Error) {
        console.error(`${unsubscribeMethod} error:`, e.message);
      }

      // TODO: Maybe add an 'errored' state or a retry limit?
      this._setSubscription(hash, {...subscription, state: 'subscribed'});    

      console.log(`${`-----> hash: ${aux}, ws.connected: ${this._rpcWebSocketConnected}, ` +
        `ws.readyState: ${this._rpcWebSocket.readyState()}, ` +
        `isActive: ${isCurrentConnectionStillActive()}`);    

      await this._updateSubscriptions();
}

prints: true, undefined, true

jpablogn avatar Aug 09 '24 16:08 jpablogn

Hello,

I solved the problem months ago but I haven't had time to do it in the right way (passing the test and pushing the code). You are already involved with v2, and this was for version v1.95.5, so it is not important any more.

There were two main problems from my point of view:

  1. Websocket status gives problems, some errors in _rpcWebSocket.Call() drives the lib to an infinite loop (see code below). So I exported 'readyState' as you suggested (rpc-websocket.ts attached) to handle webSocket status in another way in "connection.ts".

// TODO: Maybe add an 'errored' state or a retry limit? this._setSubscription(hash, { ...subscription, state: 'subscribed', });

  1. The lib has a big reentrance loop for subscriptions on "connection.ts", that drives to a stack or memory heap overflow. If you have many subscriptions (i.e.: follow 50 tokens price/status means having 200 subscriptions), every time something changes the app checks 200 subscriptions, this becomes impossible to handle by the heap and it finishes by crashing.

// it reviews all subscriptions every time each subscription changes

await Promise.all(Object.keys(this._subscriptionsByHash).map(async hash => {...all subscriptions reviewed in all status changes..})));

Instead of that I did it like this (one status machine for each subscription):

// Review all subscriptions

await Promise.all( Object.keys(this._subscriptionsByHash).map(async hash => { return this.estatus_machine(hash); }), );

// Review all changes of each subscription isolated (it doesn't review changes in one subscription because of a status change in another subscription)

private async estatus_machine(hash:string){..}

Regards. J. Pablo.

El mié, 7 ago 2024 a las 8:11, Steven Luscher @.***>) escribió:

Subscribe errors are caught here: https://github.com/solana-labs/solana-web3.js/blob/maintenance/v1.x/src/connection.ts#L6255-L6270

Unsubscribe errors are caught here: https://github.com/solana-labs/solana-web3.js/blob/maintenance/v1.x/src/connection.ts#L6310-L6322

In neither place are these errors surfaced to your application.

It's possible that a check for readyState could be added to isCurrentConnectionStillActive(). Could you try that, and send a PR if it works?

All things considered, this subscriptions API and implementation are terminally flawed. We've completely rewritten it in 2.0, and I encourage you to migrate your app now. https://x.com/anza_xyz/status/1819093864394764674

— Reply to this email directly, view it on GitHub https://github.com/solana-labs/solana-web3.js/issues/3053#issuecomment-2272691891, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAX7LQPYXCW3DWK5YQYYZXDZQG27TAVCNFSM6AAAAABL5KIX3WVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDENZSGY4TCOBZGE . You are receiving this because you authored the thread.Message ID: @.***>

jpablogn avatar Nov 22 '24 16:11 jpablogn