kubo
kubo copied to clipboard
Integrate/expose go-libp2p resource manager
go-libp2p v18 is shipping with a resource manager - yahoo! This issue encompasses the work to fully land this as a feature exposed to go-ipfs users. This includes:
- [x] Integrating go-libp2p (i.e., doing the dependency update): https://github.com/ipfs/go-ipfs/pull/8680:
- Estimate: 1 - needs review & cleanup – wip @lidel – see https://github.com/ipfs/go-ipfs/pull/8680#issuecomment-1086432509
- [x] Creating a command for resource manager stats: https://github.com/ipfs/go-ipfs/issues/8722
- Covered in the PR above https://github.com/ipfs/go-ipfs/pull/8680 – see https://github.com/ipfs/go-ipfs/pull/8680#issuecomment-1086432509
- [x] Expose stats to prometheus: https://github.com/ipfs/go-ipfs/pull/8785 (will be easier once have done #8680):
- Estimate: .5
- [x] Make it self-service for go-ipfs users to configure the resource manager limits (Lotus example: https://github.com/filecoin-project/lotus/pull/8318) . – see https://github.com/ipfs/go-ipfs/pull/8680#issuecomment-1086432509
- [x] For now use
$IPFS_PATH/limit.json
file + implicit defaults, not guaranteeing a contract. We want to learn about ergonomics from testing in production. In the end we may switch toSwarm.ResourceMgr.Limits
config that works the same as the output ofipfs swarm limit
commands.- see https://github.com/ipfs/go-ipfs/pull/8680 and https://github.com/ipfs/go-ipfs/issues/8858
- Estimate: 1
- [ ] Implement
ipfs swarm limit [scope] --reset
- tracked in https://github.com/ipfs/go-ipfs/issues/8918 (not a blocker, could be in best-effort track for now)
- [ ] Testing in production (and fixing bugs):
- Dashboard changes
- Bootstrappers
- Clusters
- Gateway staging
- Gateway banking https://github.com/protocol/bifrost-infra/issues/1815
- https://github.com/protocol/bifrost-infra/pull/1832
- dashboard
- Estimate: 8 (high uncertainty - potential dragon 🐉 )
- [x] Cleanup
- [x] remove limits.json support? (see how big of a pain this is for libp2p-maintainers if we remove it)
- [x] process any remaining UX work from https://github.com/ipfs/go-ipfs/issues/8858
- [x] enable resource manager by default after selecting some default resource limits. We don't want to rely on implicit defaults from go-libp2p.
- [x] Release notes about this feature:
- Estimate: 1
- Need to add a reminder: "if you are using internal config flags, these are the risks..."
Total 2022-04-01 estimate: 12 (rounded up from 11.5)
Note: doing the configuration part will be easier once config is moved back into go-ipfs: https://github.com/ipfs/go-ipfs-config/issues/151
Assigned to @marten-seemann currently since he is doing some of the initial work. He'll ultimately need a go-ipfs partner for landing these changes and tieing up anything else.
@guseggert : I put you as the owner now give you're doing the long-pole work of finding any landmines from production deployments. This should be prioritized first, and while collecting data we can work on https://github.com/ipfs/go-ipfs/issues/8858
2022-05-12 conversation: we're looking good. Deployments to other banks is happening in https://github.com/protocol/bifrost-infra/issues/1815.
Resolving for now. We'll engage if Bifrost raises any issues as they continue to deploy.
@BigLep I assumed we are reopening this due to:
- bifrost issue: https://github.com/protocol/bifrost-infra/issues/1815
- UX gaps documented in https://github.com/ipfs/go-ipfs/issues/9001
- a PR @guseggert will be working on to switch
Swarm.ResourceMgr.Enabled
flag to be disabled by default
lmk if I missed anything
2022-09-27:
- Power user can configure anything
- By default, libp2p limits should scale based on system resources (similar to https://github.com/libp2p/go-libp2p/blob/master/p2p/host/resource-manager/README.md#usage )
- Need to decide what assertions make.
https://github.com/ipfs/kubo/blob/master/core/node/libp2p/rcmgr_defaults.go#L21
@BigLep may the notion link be wrong?
@ajnavarro : doh - fixed - https://github.com/libp2p/go-libp2p/blob/master/p2p/host/resource-manager/README.md#usage
2022-09-29:
- Adjust log levels so see whenever there is a resource checkout
- Expose grafana dashboards from go-libp2p resource manager: https://github.com/libp2p/go-libp2p/blob/master/p2p/host/resource-manager/obs/grafana-dashboards/README.md
- Write out what we're planning to do for when resource manager is enabled by default (https://github.com/ipfs/kubo/issues/9322 )
@ajnavarro : FYI that I've been updating the checklist and status based on what I know. I just added two items:
- Confirm that the system scope limits protect nodes (and not just the peer scope limits) per https://github.com/ipfs/kubo/pull/9338#issuecomment-1310542225
- On by default: This should show up in https://github.com/ipfs/kubo/blob/master/docs/changelogs/v0.17.md
@BigLep see my comment here related to point 1: https://github.com/ipfs/kubo/pull/9338#issuecomment-1311654109
Reported issues we need to address:
- https://github.com/ipfs/kubo/issues/9406
- https://github.com/ipfs/kubo/issues/9405
@BigLep see PR #9407
@ajnavarro : for changelog and other doc updates, see https://github.com/ipfs/kubo/pull/9413 . I know you were going to take the changelog, but I jumped in here while thinking about other doc improvements we should make. Feel free to make any changes and merge the PR so the RC can be cut.
Also, I think we Bifrost Gateway configuration should simplify to this: https://github.com/protocol/bifrost-infra/issues/1815#issuecomment-1316651157
@ajnavarro : I assume this is possible but I also don't recall seeing it: how does someone learn what limits are actually passed to libp2p. I know we have ipfs swarm stats all
to get current usage, but I want to be able to see what the limits are. I don't see any logging for it in https://github.com/ipfs/kubo/blob/master/core/node/libp2p/rcmgr.go#L32
@BigLep ipfs swarm limit all
@ajnavarro : thanks. I made a docs update here: https://github.com/ipfs/kubo/pull/9421
@ajnavarro : I created a tracking issue for the critical followups we need to do: https://github.com/ipfs/kubo/issues/9442
Please go ahead and edit/update with the workstreams.