Repo Consolidation: Round 3
The transport consolidation (#1187) has dramatically reduced repo sprawl. We now only have a few go-libp2p-* repos left in the dependency tree of go-lipbp2p. This is a proposal to also consolidate them into go-lipbp2:
- go-libp2p-resource-manager:
/p2p/host/resource-manager - go-peerstore:
/p2p/host/peerstore - go-libp2p-core:
/core - go-eventbus:
/p2p/host/eventbus
Not sure what to do about go-libp2p-asn-util. This repo should probably have been named go-asn-util, it's not libp2p related in any way. I also don't feel like moving several MB of ASN here.
Anything I missed? Thoughts, @MarcoPolo @vyzo @Stebalien @raulk?
asn no, the rest is reasonable proposition.
Agree with vyzo. Otherwise, lgtm.
I think we should keep core separate, it's the interface after all.
I think we should keep core separate, it's the interface after all.
I think I disagree. core being separate is causing us a lot of trouble:
- When we want to change one of the interfaces, we first have to cut a core release before we can use them in go-libp2p. It would be nicer to make that change atomically.
- core and go-libp2p being out of sync (core already having released an update, but go-libp2p not yet) causes errors for downstream users who run
go get -u, and then we need to tell them which versions of core go with which versions of go-libp2p
We probably should keep the GitHub Action that posts the gocompat output every time we make a change to /core, so we don't accidentally break things.
Yes, but it allows libraries to just depend on core and not pull the whole go-libp2p except for trsts.
On Sat, May 28, 2022, 20:17 Marten Seemann @.***> wrote:
I think we should keep core separate, it's the interface after all.
I think I disagree. core being separate is causing us a lot of trouble:
- When we want to change one of the interfaces, we first have to cut a core release before we can use them in go-libp2p. It would be nicer to make that change atomically.
- core and go-libp2p being out of sync (core already having released an update, but go-libp2p not yet) causes errors for downstream users who run go get -u, and then we need to tell them which versions of core go with which versions of go-libp2p
We probably should keep the GitHub Action that posts the gocompat output every time we make a change to /core, so we don't accidentally break things.
— Reply to this email directly, view it on GitHub https://github.com/libp2p/go-libp2p/issues/1556#issuecomment-1140300109, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAAI4SQ6IM6VYHSN3LFE63TVMJIJNANCNFSM5XAIKMVQ . You are receiving this because you were mentioned.Message ID: @.***>
Should we be worried about this? I'd assume that if you pull in libp2p types, you're also using go-libp2p. You could imagine a project suffering from repo sprawl where a library only needs the type, and go-libp2p is pulled in by their main repo, but even then the only effect is that go get will download slightly more data.
It is semantically different; i think a separate interface is good practice for something as complex as libp2p. After all, apart from instantiation and tests, user code only cares about the interface.
think of it as the header file.
Any suggestion how to fix the two problems I listed above then? For me, this is mostly about development velocity, and I want to get away from having to release one module (in a release that we don't want users to use yet!) in order to make progress in another.
think of it as the header file.
Which is usually kept in the same directory as the source file 🤪
Any changes in interface should be carefully considered and too much velocity there is indicative of instability. We dont have to release before creating the implementation pr, and only release when the implementation in go-libp2p is ready to merge. The window of a broken go get -u is insignificantly small then.
Not convinced of the cleanliness argument, but let's look at some real-world examples. I looked at libraries building on top of libp2p, and determined their libp2p dependencies after deleting all test files (rm **/*_test.go && go mod edit -go 1.17 && go mod tidy).
go-libp2p-pubsub
github.com/libp2p/go-libp2p-core v0.15.1
github.com/libp2p/go-libp2p-discovery v0.6.0
go-libp2p-discovery was moved into go-libp2p, so this would still need to import go-libp2p.
go-libp2p-kad-dht
github.com/libp2p/go-eventbus v0.2.1
github.com/libp2p/go-libp2p-core v0.15.1
github.com/libp2p/go-libp2p-kbucket v0.4.7
github.com/libp2p/go-libp2p-peerstore v0.6.0
github.com/libp2p/go-libp2p-record v0.1.3
github.com/libp2p/go-libp2p-routing-helpers v0.2.3
github.com/libp2p/go-libp2p-swarm v0.10.2
github.com/libp2p/go-libp2p-xor v0.1.0
github.com/libp2p/go-msgio v0.2.0
github.com/libp2p/go-netroute v0.2.0
go-libp2p-swarm (moved into go-libp2p in round 2) is imported for an error assertion, and go-libp2p-peerstore (will be moved in round 3) for the slightly weird function PeerInfos.
go-bitswap
github.com/libp2p/go-buffer-pool v0.0.2
github.com/libp2p/go-libp2p v0.14.3
github.com/libp2p/go-libp2p-core v0.8.5
github.com/libp2p/go-libp2p-loggables v0.1.0
go-libp2p is import to access the ping package.
go-graphsync
github.com/libp2p/go-buffer-pool v0.0.2
github.com/libp2p/go-libp2p-core v0.8.5
github.com/libp2p/go-msgio v0.0.6
This actually wouldn't need to import go-libp2p.
I think we can still be very deliberate with changes to the core interfaces even if it's in a single repo. I'm not aware of many libraries that have their interfaces defined in a separate repo, but please correct me if I'm wrong.
From a user's POV a single repo would be better since go-libp2p + interfaces are well defined for any given commit.
My main rationale for the interface being a separate repo is that libp2p is a very complex system, and that applications and libraries only ever care about the core interface (except for instantiation and perhaps tests).
I don't want our users to get in the habbit of using internal libp2p components; the libp2p contract is that the interface is a stable facade, and in internals anything goes.
The development velocity effect is now gone, given rhat we are essentially a monorepo. When we want to make a change in core, we open pr, develop using that pr in the monorepo, and when its ready we merge both prs.
No release is necessary in either repo, it is totally fine for master to point to core master.
When we are ready for release, we bump both and off we go. The probablity of a user doing a go get -u and breaking while we are tagging releases is vanishingly small.
I think the right way to think of this is as follows: go-libp2p is now a "subproject" within the libp2p org, comprised of many separate packages that happen to be versioned together for convenience.
Minimal Dependencies
This was, IIRC, the main reason for the split.
However, this is no longer an issue, from what I can see, as long as we keep "core" from depending on anything else in go-libp2p. As of go 1.17,:
- Anyone can pull in
github.com/libp2p/go-libp2p/corewithout pulling in anything else. Yes, they'll download the rest of the repo, but that's really not a huge issue. - When vendoring code, only required packages are vendored. So I don't expect this to impact auditing.
External Transports
By including core, there's no way to:
- Develop an external transport that depends on
core. - Make go-libp2p depend on this transport.
Any time we make a breaking change to the transport, we'd need two releases to pull through an update.
On the other hand, if/when we get to the point where we want go-libp2p to depend on some transport, I expect we'd want to just pull it into the tree itself?
External Development
In theory, multiple repos allows third-parties to "own" separate codebases (other repos). In practice, this is wishful thinking.
- It's much easier for code to rot in separate repos.
- It's much harder to keep track of third-party work in separate repos.
- It's pretty easy to setup a "code owners" system to allow contributions within a single repo.
- We can now finally setup restricted tags, so it's possible to securely grant "code owner" access to parts of the repo.
- If we don't trust some third party with this level of restricted write access, we shouldn't depend on their code...
- We should explore git subtrees. This is a great way to:
- Allow external developement.
- Pull in "stable" releases into a repo that can be better managed.
Furthermore, this change won't prevent third-parties from developing transports. It just means that, if we ever want to ship the transport by default, it'll need to live in this repo (or we'll have to pull it in with git subtrees). But, IMO, that's fine.
Development Velocity
The current split makes breaking changes very painful. This isn't all bad, but it has made some refactors take longer than they should.
Additionally, It's very easy to get the system into an "unreleasable" state without realizing it.
No release is necessary in either repo, it is totally fine for master to point to core master.
This isn't fine and it has cost us (me) a ton of time. It has also delayed critical fixes, sometimes for months when we thought some tiny change to the core interfaces would be easy, but it actually caused a cascade of random stuff we needed to deal with.
Separate Versioning
One of the drawbacks to the new approach is that it makes it impossible to cut a release of a single package (e.g., for a bug fix). However, IMO, that's a good thing. It's really easy to forget that some bug was fixed in some random release of a random repo.
A couple of thoughts:
- ResourceManager readme docs in https://github.com/libp2p/docs/pull/160/ will need to be updated with new links when this is done.
- If it isn't there already, can we please capture the repo/package rationale in the README (likely taking content from https://github.com/libp2p/go-libp2p/issues/1556#issuecomment-1143924221 )?
Closing as "Update links in dos-mitigation.md" was addressed in libp2p/docs#165