universal-connectivity icon indicating copy to clipboard operation
universal-connectivity copied to clipboard

Failure to establish circuit relay reservation from browser peer

Open jjhbk opened this issue 1 year ago • 14 comments

Screenshot 2024-12-01 055134

jjhbk avatar Dec 01 '24 00:12 jjhbk

Thanks for opening this issue.

This is basically due to the browser failing to get a circuit relay reservation. The reason this happens is that we only connect to a single app specific bootstrap node (12D3KooWFhXabKDwALpzqMbto94sB7rvmZ6M28hs9Y9xSopDKwQr) which is running the go-peer code.

We could fix this a number of ways:

  • Connect to DHT from the browser and do a random walk to find a random circuit relay. This will add a lot of noise, but reduce the risk of not finding a circuit relay reservation.
  • Increase the limits on the Go peer. This will likely just kick the can down the road. Provisioned resources tend to get quickly exhausted.

I recently increased the limits https://github.com/libp2p/universal-connectivity/blob/d28e52c0e20cd7fa9629a70a0934b4c1b9660d79/go-peer/main.go#L190-L195 but since the go-peer joins the DHT, circuit relay reservations are reserved by random peers

2color avatar Jan 07 '25 16:01 2color

I had a same issue, but using cargo run how do i fix this on rust ?

nopedawn avatar Jan 15 '25 08:01 nopedawn

I had a same issue, but using cargo run how do i fix this on rust ?

Please be more specific. Which node are you connecting to?

2color avatar Jan 21 '25 16:01 2color

Didn't work in brave browser (trying 3-4 times in a row doing refresh) linux console:

lockdown-install.js:1 Removing unpermitted intrinsics
delegated-ipfs.dev/routing/v1/peers/bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p:1 
        
        
       Failed to load resource: the server responded with a status of 404 ()
ctx.tsx:48 failed to start libp2p Error: Bootstrap requires a list of peer addresses
    at new hf (index.js:54:1)
    at cu.peerDiscovery (index.js:142:1)
    at libp2p.js:121:77
    at Array.forEach (<anonymous>)
    at new cc (libp2p.js:120:1)

Then it did work but I can see this in console:

lockdown-install.js:1 Removing unpermitted intrinsics
index.js:22 Uncaught (in promise) AbortError: The operation was aborted
    at i (index.js:22:1)
    at uz.closeWrite (abstract-stream.js:228:19)
    at uz.close (abstract-stream.js:187:1)
    at muxer.js:248:1
    at Array.map (<anonymous>)
    at uH.close (muxer.js:248:1)
    at co.close [as _close] (upgrader.js:427:1)
    at co.close (index.js:118:1)
    at pg (initiate-connection.js:148:1)
    at async pw.dial (transport.js:92:63)
i @ index.js:22
closeWrite @ abstract-stream.js:228
close @ abstract-stream.js:187
(anonymous) @ muxer.js:248
close @ muxer.js:248
close @ upgrader.js:427
close @ index.js:118
pg @ initiate-connection.js:148
index.js:22 Uncaught (in promise) AbortError: The operation was aborted
    at i (index.js:22:1)
    at dM.closeWrite (abstract-stream.js:228:19)
    at dM.close (abstract-stream.js:187:1)
    at t.close (stream-to-ma-conn.js:15:1)
    at co.close [as _close] (upgrader.js:429:1)
    at async co.close (index.js:118:1)
    at async pg (initiate-connection.js:148:1)
    at async pw.dial (transport.js:92:63)
    at async queue.add.peerId [as fn] (dial-queue.js:169:1)
    at async i (index.js:28:1)
i @ index.js:22
closeWrite @ abstract-stream.js:228
close @ abstract-stream.js:187
t.close @ stream-to-ma-conn.js:15
close @ upgrader.js:429
await in close
close @ index.js:118
pg @ initiate-connection.js:148
index.js:22 Uncaught (in promise) AbortError: The operation was aborted
    at i (index.js:22:1)
    at uz.closeWrite (abstract-stream.js:228:19)
    at uz.close (abstract-stream.js:187:1)
    at muxer.js:248:1
    at Array.map (<anonymous>)
    at uH.close (muxer.js:248:1)
    at co.close [as _close] (upgrader.js:427:1)
    at co.close (index.js:118:1)
    at pg (initiate-connection.js:148:1)
    at async pw.dial (transport.js:92:63)
i @ index.js:22
closeWrite @ abstract-stream.js:228
close @ abstract-stream.js:187
(anonymous) @ muxer.js:248
close @ muxer.js:248
close @ upgrader.js:427
close @ index.js:118
pg @ initiate-connection.js:148
await in pg
dial @ transport.js:92
await in dial
dial @ transport-manager.js:86
queue.add.peerId @ dial-queue.js:169
await in queue.add.peerId
run @ job.js:55
tryToStartAnother @ index.js:66
add @ index.js:100
dial @ dial-queue.js:137
openConnection @ index.js:286
dial @ libp2p.js:223
mD @ libp2p.ts:138
2index.js:22 Uncaught (in promise) AbortError: The operation was aborted
    at i (index.js:22:1)
    at dM.closeWrite (abstract-stream.js:228:19)
    at dM.close (abstract-stream.js:187:1)
    at t.close (stream-to-ma-conn.js:15:1)
    at co.close [as _close] (upgrader.js:429:1)
    at async co.close (index.js:118:1)
    at async pg (initiate-connection.js:148:1)
    at async pw.dial (transport.js:92:63)
    at async queue.add.peerId [as fn] (dial-queue.js:169:1)
    at async i (index.js:28:1)

Refreshing getting another error:

lockdown-install.js:1 Removing unpermitted intrinsics
Failed to establish a connection to https://147.28.186.157:9095/.well-known/libp2p-webtransport?type=noise: net::ERR_QUIC_PROTOCOL_ERROR.
ctx.tsx:48 failed to start libp2p NoValidAddressesError: Transport (@libp2p/circuit-relay-v2-transport) could not listen on any available address
    at l5.listen (transport-manager.js:193:1)
    at async l5.afterStart (transport-manager.js:50:1)
    at async components.js:24:1
    at async Promise.all (/index 0)
    at async Proxy._invokeStartableMethod (components.js:21:1)
    at async Proxy.afterStart (components.js:35:1)
    at async cc.start (libp2p.js:180:1)
    at async cu (index.js:53:1)
    at async mT (libp2p.ts:41:12)
    at async ctx.tsx:37:24
(anonymous) @ ctx.tsx:48
await in (anonymous)
(anonymous) @ ctx.tsx:51
uI @ react-dom.production.min.js:243
oU @ react-dom.production.min.js:285
o @ react-dom.production.min.js:281
x @ scheduler.production.min.js:13
T @ scheduler.production.min.js:14

acul71 avatar Jan 21 '25 16:01 acul71

Thanks @acul71. That's helpful.

Failure to find addresses for app bootstrapper (go-peer)

delegated-ipfs.dev/routing/v1/peers/bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p:1 Failed to load resource: the server responded with a status of 404 ()

This is somewhat surprising. What's happening here is that the delegated routing endpoint (powered by someguy fails to resolve the peer. I have been able to reproduce this, but I'm still investigating why this is happening. Tracking issue: https://github.com/ipfs/someguy/issues/99

3 known failure modes

There seem to be 3 failure modes that are relevant for this issue:

  • The browser failing to get a circuit relay reservation on the dedicated app bootstrapper (peer id bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p).
  • The browser failing to resolve the multiaddrs for the bootstrapper bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p, because of the delegated
  • Failure to connect to the bootstrapper, even before a circuit reservation. In this case, you would still see a circuit relay error in the UI: NoValidAddressesError: Transport (@libp2p/circuit-relay-v2-transport) could not listen on any available address

Other things to investigate (out of scope for this issue)

  • It seems that we sometimes get more than one webrtc-direct multiaddr with different certhashes, when only one is valid
  • We attempt to create a reservation on all of the returned multiaddrs for a given peer. This might result in more than one reservation per peer (not sure but needs to be checked)

Next steps

  • Given all of this, it would probably be best to join the Amino DHT and find a random relay.
  • We will still need to connect to the app bootstrapper (go-peer) for peer discovery pubsub events. But this should put less stress and reduce the chances of resource exhaustion on the app bootstrapper

2color avatar Jan 22 '25 10:01 2color

@2color , As mentioned by you the possible failure modes--

The browser failing to get a circuit relay reservation on the dedicated app bootstrapper (peer id bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p).

The browser failing to resolve the multiaddrs for the bootstrapper bafzaajaiaejcav3fwj35j27gor72ap5aqhiz44qmje4gcxvo5wogjmczwhk4xp7p, because of the delegated routing

  • chrome browser [windows 11]--jslibp2p

Image

  • However after few refeshes and switching browser it is working spasmodically fine.

  • [ ] Would it be viable to register it as bootstrapper node as fallback? or maybe mechanism to refresh the bootstrapper mulitaddrs list intervally to override/counter browser's failure to get/resolve multiaddrs?

addresses: {
    listen: [
      '/ip4/0.0.0.0/tcp/64484',
      '/ip4/0.0.0.0/tcp/64485/ws'
    ]
  },

Image

Nkovaturient avatar Jan 29 '25 18:01 Nkovaturient

So is this just a "all circuits are busy, try again later" type of issue? If so, then I could probably stand up an instance or two. We've got gigabit fiber coming in the next couple of days and I have some spare hardware and a static IP that would likely serve.

devlux76 avatar Feb 13 '25 19:02 devlux76

So is this just a "all circuits are busy, try again later" type of issue?

Pretty much.

I looked into this today, and have observed in the logs that the limits for incoming connections on the go-peer boostrapper that I run (12D3KooWFhXabKDwALpzqMbto94sB7rvmZ6M28hs9Y9xSopDKwQr) get exhausted quickly.

I'll need to add some instrumentation in place in order to figure out why.

--

Using the DHT in the js-peer (web) to find random circuit relays

I tried what I suggested in https://github.com/libp2p/universal-connectivity/issues/207#issuecomment-2575787205 and connect to the DHT to find a circuit relay.

It was a bit hard to test, because we still rely on the deployed app bootstrapper to relay gossipsub peer discovery messages (for browsers to discover each other's multiaddrs), and if you can't connect to app bootstrapper, you won't be able to find any other app peers. What's even more annoiying is that you have all these DHT peers, but no chat peers.

So how do we fix this?

Browser peers are completely dependent on the hard-coded app boostrapper (running the go peer) to be able to find other peers due to how I set up pubsub peer discovery.

Until we have a better peer discovery mechanism (rendezvous with DHT, ambient peer discovery, or you name it), I think we need to adjust the resource manager settings in the go-peer. Additionally, we need to add a Prometheus metrics endpoint so we can get better operational insight under the hood. It shouldn't be that resource intensive to allow hundreds of incoming connections.

2color avatar Feb 14 '25 13:02 2color

I updated the app boostrapper to use https://github.com/libp2p/universal-connectivity/pull/218 and I'm tracking the limits. I'll let it run and inspect the logs to see how it develops over the weekend.

2color avatar Feb 14 '25 14:02 2color

Hey @2color I can see you're working on a lot of things and I just want to say thanks.

Before deciding to use this library I explored piggy backing the bootstrap of my own stack by connecting to bitcoin/litecoin/dogecoin seed nodes via a socks proxy and looking for wallet strings.

This actually worked pretty well. Technically it's abusing the seed nodes a little bit, but I believe it solves the issue here since those seeds are all stable.

Ethereum uses libp2p natively, so what if we bootstrapped from there in cases where this "all circuits are busy" stuff is going on?

devlux76 avatar Feb 14 '25 17:02 devlux76

@devlux76 Thanks. To avoid derailing the discussion in this issue, I suggest you open another issue/discussion.

Note that this repo is not a library, rather a working example showing how different implementations of libp2p work together using shared transports.

2color avatar Feb 20 '25 12:02 2color

Quick update:

  • I merged https://github.com/libp2p/universal-connectivity/pull/222 which increased the limits of the go-peer. It's configured in a way which can exhaust resources quicly. But we'll run it for a while and see it it fares. By default, the go-libp2p resource manager allocates an 1/8th of the available resources, which is why the server this was running on was heavily underutilized.
  • Another PR fixes bugs in how the frontend js-peer connect to the boostrapper https://github.com/libp2p/universal-connectivity/pull/225

Together, these should help solve this issue

2color avatar Feb 20 '25 13:02 2color

Didn't work in brave browser linux https://github.com/libp2p/universal-connectivity/issues/207#issuecomment-2605234015 Now is working 👍

acul71 avatar Feb 25 '25 15:02 acul71

@jjhbk Can you confirm that this is no longer an issue?

2color avatar Feb 26 '25 11:02 2color