kubo icon indicating copy to clipboard operation
kubo copied to clipboard

Multiple peers rebroadcast messages simultaneously during Pubsub Flood

Open gvelez17 opened this issue 1 year ago • 2 comments

Checklist

Installation method

ipfs-update or dist.ipfs.tech

Version

0.18.1 on most nodes, and on several involved nodes.  Some nodes on the network may be running earlier versions, since they are not all under our control.

Config

Note, this is only for one node, however it is the one reflected in the cpu graph below.  Other nodes may have different configs.

{
  "API": {
    "HTTPHeaders": {}
  },
  "Addresses": {
    "API": "/ip4/0.0.0.0/tcp/5011",
    "Announce": [],
    "AppendAnnounce": [],
    "Gateway": "/ip4/0.0.0.0/tcp/9011",
    "NoAnnounce": [],
    "Swarm": [
      "/ip4/0.0.0.0/tcp/4010",
      "/ip4/0.0.0.0/tcp/4011/ws"
    ]
  },
  "AutoNAT": {},
  "Bootstrap": [
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmNnooDu7bfjPFoTZYxMNLWUQJyrVwtbZg5gBMjTezGAJN",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmQCU2EcMqAqQPR2i9bChDtGNJchTbq5TbXJJ16u19uLTa",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmbLHAnMoJPWSCR5Zhtx6BHJX9KiKNN6tpvbUcqanj75Nb",
    "/dnsaddr/bootstrap.libp2p.io/p2p/QmcZf59bWwK5XFi76CZX8cbJ4BhTzzA3gU1ZjYZcYW3dwt",
    "/ip4/104.131.131.82/tcp/4001/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ",
    "/ip4/104.131.131.82/udp/4001/quic/p2p/QmaCpDMGvV2BGHeYERUEnRQAwe3N8SzbUtfsmvsqQLuvuJ"
  ],
  "DNS": {
    "Resolvers": {}
  },
  "Datastore": {
    "BloomFilterSize": 0,
    "GCPeriod": "1h",
    "HashOnRead": false,
    "Spec": {
      "mounts": [
        {
          "child": {
            "accessKey": "AKIA6CODUIKZYDOFFI4H",
            "bucket": "ceramic-prod-cas-cpc-node",
            "keyTransform": "next-to-last/2",
            "region": "us-east-2",
            "rootDirectory": "ipfs/blocks",
            "secretKey": "wuIfgHwr7pRUXwsaADrWVphU4F2tyE26GK+rx1Ws",
            "type": "s3ds"
          },
          "mountpoint": "/blocks",
          "prefix": "s3.datastore",
          "type": "measure"
        },
        {
          "child": {
            "compression": "none",
            "path": "datastore",
            "type": "levelds"
          },
          "mountpoint": "/",
          "prefix": "leveldb.datastore",
          "type": "measure"
        }
      ],
      "type": "mount"
    },
    "StorageGCWatermark": 90,
    "StorageMax": "10GB"
  },
  "Discovery": {
    "MDNS": {
      "Enabled": true,
      "Interval": 10
    }
  },
  "Experimental": {
    "AcceleratedDHTClient": false,
    "FilestoreEnabled": false,
    "GraphsyncEnabled": false,
    "Libp2pStreamMounting": false,
    "P2pHttpProxy": false,
    "StrategicProviding": false,
    "UrlstoreEnabled": false
  },
  "Gateway": {
    "APICommands": [],
    "HTTPHeaders": {
      "Access-Control-Allow-Headers": [
        "X-Requested-With",
        "Range",
        "User-Agent"
      ],
      "Access-Control-Allow-Methods": [
        "GET"
      ],
      "Access-Control-Allow-Origin": [
        "*"
      ]
    },
    "NoDNSLink": false,
    "NoFetch": false,
    "PathPrefixes": [],
    "PublicGateways": null,
    "RootRedirect": "",
    "Writable": false
  },
  "Identity": {
    "PeerID": "QmUvEKXuorR7YksrVgA7yKGbfjWHuCRisw2cH9iqRVM9P8"
  },
  "Internal": {},
  "Ipns": {
    "RecordLifetime": "",
    "RepublishPeriod": "",
    "ResolveCacheSize": 128
  },
  "Migration": {
    "DownloadSources": [],
    "Keep": ""
  },
  "Mounts": {
    "FuseAllowOther": false,
    "IPFS": "/ipfs",
    "IPNS": "/ipns"
  },
  "Peering": {
    "Peers": [
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-mainnet-external.3boxlabs.com/tcp/4011/ws/p2p/QmXALVsXZwPWTUbsT8G6VVzzgTJaAWRUD7FWL5f7d5ubAL"
        ],
        "ID": "QmXALVsXZwPWTUbsT8G6VVzzgTJaAWRUD7FWL5f7d5ubAL"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-cas-mainnet-external.3boxlabs.com/tcp/4011/ws/p2p/QmUvEKXuorR7YksrVgA7yKGbfjWHuCRisw2cH9iqRVM9P8"
        ],
        "ID": "QmUvEKXuorR7YksrVgA7yKGbfjWHuCRisw2cH9iqRVM9P8"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-elp-1-1-external.3boxlabs.com/tcp/4011/ws/p2p/QmUiF8Au7wjhAF9BYYMNQRW5KhY7o8fq4RUozzkWvHXQrZ"
        ],
        "ID": "QmUiF8Au7wjhAF9BYYMNQRW5KhY7o8fq4RUozzkWvHXQrZ"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-elp-1-2-external.3boxlabs.com/tcp/4011/ws/p2p/QmRNw9ZimjSwujzS3euqSYxDW9EHDU5LB3NbLQ5vJ13hwJ"
        ],
        "ID": "QmRNw9ZimjSwujzS3euqSYxDW9EHDU5LB3NbLQ5vJ13hwJ"
      },
      {
        "Addrs": [
          "/dns4/go-ipfs-ceramic-private-cas-clay-external.3boxlabs.com/tcp/4011/ws/p2p/QmbeBTzSccH8xYottaYeyVX8QsKyox1ExfRx7T1iBqRyCd"
        ],
        "ID": "QmbeBTzSccH8xYottaYeyVX8QsKyox1ExfRx7T1iBqRyCd"
      }
    ]
  },
  "Pinning": {
    "RemoteServices": {}
  },
  "Plugins": {
    "Plugins": null
  },
  "Provider": {
    "Strategy": ""
  },
  "Pubsub": {
    "DisableSigning": false,
    "Enabled": true,
    "Router": "",
    "SeenMessagesTTL": "10m"
  },
  "Reprovider": {},
  "Routing": {},
  "Swarm": {
    "AddrFilters": null,
    "ConnMgr": {},
    "DisableBandwidthMetrics": false,
    "DisableNatPortMap": false,
    "RelayClient": {
      "Enabled": false
    },
    "RelayService": {},
    "Transports": {
      "Multiplexers": {},
      "Network": {},
      "Security": {}
    }
  },
  "algorithm": "rsa"
}

Description

Related to the Pubsub Flood issue https://github.com/ipfs/kubo/issues/9665, this is specifically to note that when the flood begins, multiple peers become involved simulaneously though with different seqnos, different messages and different origin from peers.

The result of the flood is a near-max of the CPU on our critical IPFS node for Ceramic Anchor Service

image

Is there some network condition that would simultaneously trigger upwards of 20 different nodes to engage in rebroadcasting of different messages? Is there a setting that would help tune this back?

image

We greatly appreciate the nonce validator added by @vyzo in https://github.com/libp2p/go-libp2p-pubsub/releases/tag/v0.9.2

Noted that it does not appear to be used yet by the latest kubo https://github.com/ipfs/kubo/blob/master/go.mod#L74 , is it safe to simply include this module version in a source build?

Is there perhaps a backoff setting that should also be used when the activity is happening across the network? If a node detects that it is receiving the identical message from >3 peers, should it be pruning its peer list?

Any advice or suggestions for what to try to turn off the pubsub flood very welcome, we can experiment with individual nodes and if a solution is found we can communicate with our user base to at least get it across much of the network.

gvelez17 avatar Mar 24 '23 09:03 gvelez17

This may not mean anything, but an analysis of about 30 minutes of data seems to show a different pattern for the messages that begin a chain of duplicates than for other messages.

These were determined by finding message groups by seqno, then filtering for ones where the original peer (From:) matched the receivedFrom header, which we exposed in a slightly modified version of kubo just to output this field.

The messages that kick off a chain of rebroadcasts are majority RESPONSE type messages. (In Ceramic, messages are UPDATE, QUERY, RESPONSE or KEEPALIVE)

# counts from the messages that are received from the original peer
# and later result in rebroadcasts
(Pdb) de.typ.value_counts()
2    3903
0     176
3     164
1       8

# all the messages seen in 30 minutes
Name: typ, dtype: int64
(Pdb) df.typ.value_counts()
2    48306
0    22566
1    21393
3     4011

gvelez17 avatar Mar 24 '23 09:03 gvelez17

Noted that it does not appear to be used yet by the latest kubo https://github.com/ipfs/kubo/blob/master/go.mod#L74 , is it safe to simply include this module version in a source build?

Per https://github.com/ipfs/kubo/issues/9665#issuecomment-1462934036 , https://github.com/libp2p/go-libp2p-pubsub/releases/tag/v0.9.2 isn't going to make it into master.

I'm not aware of any issues if you include this module version into your own Kubo build though. I'll let @Jorropo comment. I know we didn't (and aren't planning) to bring it into Kubo master because it brakes interop with the JS stack. We're instead deprecated the pubsub commands: https://github.com/ipfs/kubo/issues/9717

BigLep avatar Mar 25 '23 04:03 BigLep