js-libp2p icon indicating copy to clipboard operation
js-libp2p copied to clipboard

[switch] Maximum call stack size exceeded

Open alanshaw opened this issue 6 years ago • 17 comments

/Users/alan/Code/protocol-labs/js-ipfs/node_modules/libp2p-switch/src/observer.js:41
    peerInfo.then((_peerInfo) => {
             ^

RangeError: Maximum call stack size exceeded
    at Promise.then (<anonymous>)
    at willObserve (/Users/alan/Code/protocol-labs/js-ipfs/node_modules/libp2p-switch/src/observer.js:41:14)
    at pull.map (/Users/alan/Code/protocol-labs/js-ipfs/node_modules/libp2p-switch/src/observer.js:34:9)
    at /Users/alan/Code/protocol-labs/js-ipfs/node_modules/pull-stream/throughs/map.js:13:23
    at /Users/alan/Code/protocol-labs/js-ipfs/node_modules/pull-reader/index.js:114:13
    at /Users/alan/Code/protocol-labs/js-ipfs/node_modules/pull-reader/index.js:114:13
    at onNext (/Users/alan/Code/protocol-labs/js-ipfs/node_modules/pull-catch/index.js:17:17)
    at drain (/Users/alan/Code/protocol-labs/js-ipfs/node_modules/stream-to-pull-stream/index.js:141:18)
    at Channel.<anonymous> (/Users/alan/Code/protocol-labs/js-ipfs/node_modules/stream-to-pull-stream/index.js:150:5)
    at Channel.emit (events.js:182:13)

alanshaw avatar Dec 10 '18 14:12 alanshaw

@alanshaw any other info on the state of the node when this happened? Testing/idle node/etc.

jacobheun avatar Dec 10 '18 15:12 jacobheun

Sorry, should lave left more context - just wanted to get this logged before I forgot!

It's a JS IPFS daemon running on this branch https://github.com/ipfs/js-ipfs/pull/856

I'm using companion and I opened the WebUI, looked around a bit and clicked on the peers page to see the hundreds of nodes (very exciting btw), shortly after my daemon crashed with that error.

alanshaw avatar Dec 10 '18 15:12 alanshaw

Biggest stack I have ever seen https://ci.ipfs.team/blue/rest/organizations/jenkins/pipelines/IPFS/pipelines/js-ipfs/branches/PR-856/runs/80/nodes/257/steps/386/log/?start=0

alanshaw avatar Dec 11 '18 15:12 alanshaw

This appears to be an issue with the pinning code spamming dht requests in ipfs. It looks like when checking for descendants in a pin set, before completely checking locally it calls out to the network. Since the code is executing dag.get in parallel, it ends up flooding peers with dht queries on large pin sets. Calling that in series until the underlying pinning problem is fixed should mitigate this. We'll get the necessary updates added in conjunction with ipfs/js-ipfs#856.

cc: @vasco-santos

jacobheun avatar Dec 14 '18 15:12 jacobheun

Wow! I've found that as well in a lot of bubbleprof/flame diagrams. I think that usage of async should be refactored.

mcollina avatar Dec 14 '18 15:12 mcollina

Seeing this in Jenkins again today:

2018-12-19T11:47:31.635Z ipfsd-ctl:daemon:stderr /tmp/jenkins/workspace/PFS_interop_test-with-debug-NUL3VSMMUZYNRCKUQJKWVW6DEXXEDFCXGEMGCBOBYZCZKZUXXGHA@2/node_modules/libp2p-switch/src/observer.js:41
    peerInfo.then((_peerInfo) => {
             ^

RangeError: Maximum call stack size exceeded
    at Promise.then (<anonymous>)
    at willObserve (/tmp/jenkins/workspace/PFS_interop_test-with-debug-NUL3VSMMUZYNRCKUQJKWVW6DEXXEDFCXGEMGCBOBYZCZKZUXXGHA@2/node_modules/libp2p-switch/src/observer.js:41:14)
    at pull.map (/tmp/jenkins/workspace/PFS_interop_test-with-debug-NUL3VSMMUZYNRCKUQJKWVW6DEXXEDFCXGEMGCBOBYZCZKZUXXGHA@2/node_modules/libp2p-switch/src/observer.js:34:9)
    at /tmp/jenkins/workspace/PFS_interop_test-with-debug-NUL3VSMMUZYNRCKUQJKWVW6DEXXEDFCXGEMGCBOBYZCZKZUXXGHA@2/node_modules/pull-stream/throughs/map.js:13:23
    at /tmp/jenkins/workspace/PFS_interop_test-with-debug-NUL3VSMMUZYNRCKUQJKWVW6DEXXEDFCXGEMGCBOBYZCZKZUXXGHA@2/node_modules/pull-reader/index.js:114:13
    at /tmp/jenkins/workspace/PFS_interop_test-with-debug-NUL3VSMMUZYNRCKUQJKWVW6DEXXEDFCXGEMGCBOBYZCZKZUXXGHA@2/node_modules/pull-reader/index.js:114:13
    at /tmp/jenkins/workspace/PFS_interop_test-with-debug-NUL3VSMMUZYNRCKUQJKWVW6DEXXEDFCXGEMGCBOBYZCZKZUXXGHA@2/node_modules/pull-reader/index.js:114:13
    at /tmp/jenkins/workspace/PFS_interop_test-with-debug-NUL3VSMMUZYNRCKUQJKWVW6DEXXEDFCXGEMGCBOBYZCZKZUXXGHA@2/node_modules/pull-stream/throughs/async-map.js:39:20
    at decipher.decrypt (/tmp/jenkins/workspace/PFS_interop_test-with-debug-NUL3VSMMUZYNRCKUQJKWVW6DEXXEDFCXGEMGCBOBYZCZKZUXXGHA@2/node_modules/libp2p-secio/src/etm.js:64:11)
    at Object.decrypt (/tmp/jenkins/workspace/PFS_interop_test-with-debug-NUL3VSMMUZYNRCKUQJKWVW6DEXXEDFCXGEMGCBOBYZCZKZUXXGHA@2/node_modules/libp2p-crypto/src/aes/index.js:25:7)

The DHT code is not present in this build.

achingbrain avatar Dec 19 '18 16:12 achingbrain

@achingbrain can you link the build?

jacobheun avatar Dec 19 '18 16:12 jacobheun

This is likely my fault. I removed a setImmediate where the event was emitted. That should either a) be reintroduced or b) added as a process.nextTick.

mcollina avatar Dec 19 '18 16:12 mcollina

@jacobheun https://ci.ipfs.team/blue/rest/organizations/jenkins/pipelines/IPFS/pipelines/interop/branches/test-with-debug/runs/2/nodes/254/log/?start=0

achingbrain avatar Dec 19 '18 16:12 achingbrain

This was the PR: https://github.com/libp2p/js-libp2p-switch/pull/289.

mcollina avatar Dec 19 '18 16:12 mcollina

@mcollina this one? https://github.com/libp2p/js-libp2p-switch/pull/289/files#diff-08f3e9ab539253c87ea111cc61c69e29

achingbrain avatar Dec 19 '18 16:12 achingbrain

Aha, yeah.

achingbrain avatar Dec 19 '18 16:12 achingbrain

yes.

mcollina avatar Dec 19 '18 16:12 mcollina

We were seeing the call stack prior to that switch update being released. It might be contributing, but I don't think it's the core issue.

jacobheun avatar Dec 19 '18 16:12 jacobheun

I would recommend to add a Error.stackTraceLimit = 1024 somewhere to get a bit more of 10 lines.

mcollina avatar Dec 19 '18 18:12 mcollina

Okay, I've got a branch of the interop tests passing based off the latest ipfs rc candidate, https://ci.ipfs.team/blue/organizations/jenkins/IPFS%2Finterop/detail/test%2Ffix-async/4/pipeline/254.

The underlying issue seems to be due to a lot of asynchronous code calling their callbacks synchronously, which is bloating the stack. I'll have a few PRs out in a bit for the code I updated to help alleviate the issue.

jacobheun avatar Dec 19 '18 22:12 jacobheun

I'm seeing this too, but don't have any helpful info except that it happens for me after lots of bitswap... stack traces aren't helpful.

tobowers avatar Sep 20 '19 11:09 tobowers

The code has been reimplemented since this issue, closing as stale

wemeetagain avatar Sep 28 '23 02:09 wemeetagain