kubo icon indicating copy to clipboard operation
kubo copied to clipboard

Integrate/expose go-libp2p resource manager

Open BigLep opened this issue 2 years ago • 4 comments

go-libp2p v18 is shipping with a resource manager - yahoo! This issue encompasses the work to fully land this as a feature exposed to go-ipfs users. This includes:

  • [x] Integrating go-libp2p (i.e., doing the dependency update): https://github.com/ipfs/go-ipfs/pull/8680:
    • Estimate: 1 - needs review & cleanup – wip @lidel – see https://github.com/ipfs/go-ipfs/pull/8680#issuecomment-1086432509
  • [x] Creating a command for resource manager stats: https://github.com/ipfs/go-ipfs/issues/8722
    • Covered in the PR above https://github.com/ipfs/go-ipfs/pull/8680 – see https://github.com/ipfs/go-ipfs/pull/8680#issuecomment-1086432509
  • [x] Expose stats to prometheus: https://github.com/ipfs/go-ipfs/pull/8785 (will be easier once have done #8680):
    • Estimate: .5
  • [x] Make it self-service for go-ipfs users to configure the resource manager limits (Lotus example: https://github.com/filecoin-project/lotus/pull/8318) . – see https://github.com/ipfs/go-ipfs/pull/8680#issuecomment-1086432509
  • [x] For now use $IPFS_PATH/limit.json file + implicit defaults, not guaranteeing a contract. We want to learn about ergonomics from testing in production. In the end we may switch to Swarm.ResourceMgr.Limits config that works the same as the output of ipfs swarm limit commands.
    • see https://github.com/ipfs/go-ipfs/pull/8680 and https://github.com/ipfs/go-ipfs/issues/8858
    • Estimate: 1
  • [ ] Implement ipfs swarm limit [scope] --reset
    • tracked in https://github.com/ipfs/go-ipfs/issues/8918 (not a blocker, could be in best-effort track for now)
  • [ ] Testing in production (and fixing bugs):
    • Dashboard changes
    • Bootstrappers
    • Clusters
    • Gateway staging
    • Gateway banking https://github.com/protocol/bifrost-infra/issues/1815
      • https://github.com/protocol/bifrost-infra/pull/1832
      • dashboard
    • Estimate: 8 (high uncertainty - potential dragon 🐉 )
  • [x] Cleanup
    • [x] remove limits.json support? (see how big of a pain this is for libp2p-maintainers if we remove it)
    • [x] process any remaining UX work from https://github.com/ipfs/go-ipfs/issues/8858
    • [x] enable resource manager by default after selecting some default resource limits. We don't want to rely on implicit defaults from go-libp2p.
  • [x] Release notes about this feature:
    • Estimate: 1
    • Need to add a reminder: "if you are using internal config flags, these are the risks..."

Total 2022-04-01 estimate: 12 (rounded up from 11.5)

Note: doing the configuration part will be easier once config is moved back into go-ipfs: https://github.com/ipfs/go-ipfs-config/issues/151

BigLep avatar Mar 03 '22 23:03 BigLep

Assigned to @marten-seemann currently since he is doing some of the initial work. He'll ultimately need a go-ipfs partner for landing these changes and tieing up anything else.

BigLep avatar Mar 04 '22 01:03 BigLep

@guseggert : I put you as the owner now give you're doing the long-pole work of finding any landmines from production deployments. This should be prioritized first, and while collecting data we can work on https://github.com/ipfs/go-ipfs/issues/8858

BigLep avatar Apr 08 '22 15:04 BigLep

2022-05-12 conversation: we're looking good. Deployments to other banks is happening in https://github.com/protocol/bifrost-infra/issues/1815.

Resolving for now. We'll engage if Bifrost raises any issues as they continue to deploy.

BigLep avatar May 12 '22 17:05 BigLep

@BigLep I assumed we are reopening this due to:

  • bifrost issue: https://github.com/protocol/bifrost-infra/issues/1815
  • UX gaps documented in https://github.com/ipfs/go-ipfs/issues/9001
  • a PR @guseggert will be working on to switch Swarm.ResourceMgr.Enabled flag to be disabled by default

lmk if I missed anything

lidel avatar May 31 '22 17:05 lidel

2022-09-27:

  • Power user can configure anything
  • By default, libp2p limits should scale based on system resources (similar to https://github.com/libp2p/go-libp2p/blob/master/p2p/host/resource-manager/README.md#usage )
  • Need to decide what assertions make.

https://github.com/ipfs/kubo/blob/master/core/node/libp2p/rcmgr_defaults.go#L21

BigLep avatar Sep 27 '22 17:09 BigLep

@BigLep may the notion link be wrong?

ajnavarro avatar Sep 28 '22 14:09 ajnavarro

@ajnavarro : doh - fixed - https://github.com/libp2p/go-libp2p/blob/master/p2p/host/resource-manager/README.md#usage

BigLep avatar Sep 28 '22 18:09 BigLep

2022-09-29:

  • Adjust log levels so see whenever there is a resource checkout
  • Expose grafana dashboards from go-libp2p resource manager: https://github.com/libp2p/go-libp2p/blob/master/p2p/host/resource-manager/obs/grafana-dashboards/README.md
  • Write out what we're planning to do for when resource manager is enabled by default (https://github.com/ipfs/kubo/issues/9322 )

BigLep avatar Sep 29 '22 14:09 BigLep

@ajnavarro : FYI that I've been updating the checklist and status based on what I know. I just added two items:

  1. Confirm that the system scope limits protect nodes (and not just the peer scope limits) per https://github.com/ipfs/kubo/pull/9338#issuecomment-1310542225
  2. On by default: This should show up in https://github.com/ipfs/kubo/blob/master/docs/changelogs/v0.17.md

BigLep avatar Nov 10 '22 16:11 BigLep

@BigLep see my comment here related to point 1: https://github.com/ipfs/kubo/pull/9338#issuecomment-1311654109

ajnavarro avatar Nov 11 '22 12:11 ajnavarro

Reported issues we need to address:

  1. https://github.com/ipfs/kubo/issues/9406
  2. https://github.com/ipfs/kubo/issues/9405

BigLep avatar Nov 15 '22 01:11 BigLep

@BigLep see PR #9407

ajnavarro avatar Nov 15 '22 10:11 ajnavarro

@ajnavarro : for changelog and other doc updates, see https://github.com/ipfs/kubo/pull/9413 . I know you were going to take the changelog, but I jumped in here while thinking about other doc improvements we should make. Feel free to make any changes and merge the PR so the RC can be cut.

Also, I think we Bifrost Gateway configuration should simplify to this: https://github.com/protocol/bifrost-infra/issues/1815#issuecomment-1316651157

BigLep avatar Nov 16 '22 09:11 BigLep

@ajnavarro : I assume this is possible but I also don't recall seeing it: how does someone learn what limits are actually passed to libp2p. I know we have ipfs swarm stats all to get current usage, but I want to be able to see what the limits are. I don't see any logging for it in https://github.com/ipfs/kubo/blob/master/core/node/libp2p/rcmgr.go#L32

BigLep avatar Nov 17 '22 04:11 BigLep

@BigLep ipfs swarm limit all

ajnavarro avatar Nov 17 '22 09:11 ajnavarro

@ajnavarro : thanks. I made a docs update here: https://github.com/ipfs/kubo/pull/9421

BigLep avatar Nov 18 '22 17:11 BigLep

@ajnavarro : I created a tracking issue for the critical followups we need to do: https://github.com/ipfs/kubo/issues/9442

Please go ahead and edit/update with the workstreams.

BigLep avatar Dec 01 '22 16:12 BigLep