lnd icon indicating copy to clipboard operation
lnd copied to clipboard

Create separate go module for proto files #5402

Open l0k18 opened this issue 2 years ago • 30 comments

Change Description

https://github.com/lightningnetwork/lnd/issues/5402

Basically split the files between generated and implementation related with the generated staying in the same path and the implementations going to a separate path inside the same folder again with the same name and package name.

Largely changed nothing in the code except for some of the servers needing to import a lot of types from the generated. In addition, in places where both generated and implementation code was imported from other packages the secondary, implementation package got an import alias '2' and is visible by its stuttering pathname (eg lnrpc/verrpc/verrpc)

The router package was a bit more tricky because the code used one non-exported internal type, an interface. This was solved by using a type alias in a shim.go file.

Steps to Test

Nothing should be different except for the imports and module structure being changed as per the issue.

Pull Request Checklist

Testing

  • [ ] Your PR passes all CI checks.

    • make check passes until failing on a test that has a wrong difficulty given for a block and aborts.
=== RUN   TestLightningNetworkDaemon/tranche00/98-of-105/btcd/3rd_party_anchor_spend
    test_harness.go:91: Failed: (3rd party anchor spend): exited with error: 
        *errors.errorString unable to generate block: rejected: block difficulty of 545259519 is not the expected value of 537001983
        /home/davidvennik/src/github.com/l0k18/lnd/lntest/itest/lnd_onchain_test.go:566 (0xfedcb4)
        	testAnchorThirdPartySpend: t.Fatalf("unable to generate block: %v", err)
        /home/davidvennik/src/github.com/l0k18/lnd/lntest/itest/test_harness.go:114 (0xf9471c)
        	(*harnessTest).RunTestCase: testCase.test(h.lndHarness, h)
        /home/davidvennik/src/github.com/l0k18/lnd/lntest/itest/lnd_test.go:245 (0x103423a)
        	TestLightningNetworkDaemon.func4: ht.RunTestCase(testCase)
        /home/davidvennik/go/src/testing/testing.go:1439 (0x513002)
        	tRunner: fn(t)
        /home/davidvennik/go/src/runtime/asm_amd64.s:1571 (0x469381)
        	goexit: BYTE	$0x90	// NOP
  • [ ] ~~Tests covering the positive and negative (error paths) are included.~~
  • [ ] ~~Bug fixes contain tests triggering the bug to prevent regressions.~~

No actual code was changed, only moved around. I only just figured out while doing this that I can work on the original version and push to my repo. I wouldn't have a clue how to replicate the CI setup, I would appreciate a pointer or two for that.

Code Style and Documentation

  • [x] The change obeys the Code Documentation and Commenting guidelines, and lines wrap at 80.
    • should be fine only when the refactoring extended function signatures over more lines they were restructured.
  • [x] Commits follow the Ideal Git Commit Structure.
  • [x] Any new logging statements use an appropriate subsystem and logging level.
  • [x] There is a change description in the release notes, or [skip ci] in the commit message for small changes.
    • in this case just a short note in the readme in the lnrpc README.md about the separation of the protos from implementation parts and the precise protoc version and plugin versions and how to get them so the generator makes identical outputs.

l0k18 avatar Jun 18 '22 17:06 l0k18

Please also make sure that make check runs correctly on your machine.

bhandras avatar Jun 20 '22 08:06 bhandras

Sorry for the late feedback, I should have commented on https://github.com/lightningnetwork/lnd/issues/5402 earlier.

I think we want to keep the protos in the same place as they currently are, for the reason you mentioned (number of changes needed in lnd and other projects). But I think we should move the server implementations to a completely new package (not submodule), which could for example be called rpcservers that then has the same sub structure as today (autopilotrpc/, chainrpc/). So all the build tag guarded code that you moved one level down should instead be moved to this new rpcservers package. And maybe we remove the *rpc suffix from the sub packages, so that it will just be rpcservers/autopilot or rpcservers/router to avoid needing to alias things with routerrpc2 as you had to do in rpcserver.go.

guggero avatar Jun 21 '22 14:06 guggero

Sorry for the late feedback, I should have commented on #5402 earlier.

I think we want to keep the protos in the same place as they currently are, for the reason you mentioned (number of changes needed in lnd and other projects). But I think we should move the server implementations to a completely new package (not submodule), which could for example be called rpcservers that then has the same sub structure as today (autopilotrpc/, chainrpc/). So all the build tag guarded code that you moved one level down should instead be moved to this new rpcservers package. And maybe we remove the *rpc suffix from the sub packages, so that it will just be rpcservers/autopilot or rpcservers/router to avoid needing to alias things with routerrpc2 as you had to do in rpcserver.go.

I like this solution. It will also put it nearer to rpcperms which I couldn't help but notice has a lot of interconnections. I will rearrange it. It won't take very long since everything is already now separated correctly. I will also ensure make check is working correctly.

l0k18 avatar Jun 21 '22 14:06 l0k18

Sorry for the late feedback, I should have commented on #5402 earlier.

I think we want to keep the protos in the same place as they currently are, for the reason you mentioned (number of changes needed in lnd and other projects). But I think we should move the server implementations to a completely new package (not submodule), which could for example be called rpcservers that then has the same sub structure as today (autopilotrpc/, chainrpc/). So all the build tag guarded code that you moved one level down should instead be moved to this new rpcservers package. And maybe we remove the *rpc suffix from the sub packages, so that it will just be rpcservers/autopilot or rpcservers/router to avoid needing to alias things with routerrpc2 as you had to do in rpcserver.go.

Might need some disambiguating suffix still... autopilot, invoices, watchtower and wtclient are reused else where and just so happened to be visible colliding in subserver_config.go that I happened to have open as I was doing the refactors, drawing my attention to this.

To make them more distinctive perhaps change the colliding ones to prefix rpc?

Scratch that. The only one that has a lot of collisions is invoicesrpc, mostly in tests. These few will be aliased.

l0k18 avatar Jun 21 '22 16:06 l0k18

Please also make sure that make check runs correctly on your machine.

As part of this, I have set up and am syncing bitcoind and btcd on my machine. The tests don't seem to work while they are syncing I suppose? I will check how to configure them. It isn't a big extra load to have running in the background when synced, and I like seeing PASS.

l0k18 avatar Jun 22 '22 02:06 l0k18

You don't need to have any node synced as the tests all run on a separate regtest network anyway. So just having the binaries installed should be enough for the tests.

Also, some of the unit and integration tests are a bit flaky, so they sometimes fail without a change being the cause for it. Maybe it's best if you run the different commands separately (make lint, make rpc-check, make unit, make itest) to see where things fail.

guggero avatar Jun 22 '22 07:06 guggero

I can clear all the linting errors, most of them were nothing I did but happen to be all within the new rpcservers folder and don't have many effects outside this area, so I have done them all, I hope this is ok.

l0k18 avatar Jun 22 '22 08:06 l0k18

You don't need to have any node synced as the tests all run on a separate regtest network anyway. So just having the binaries installed should be enough for the tests.

Also, some of the unit and integration tests are a bit flaky, so they sometimes fail without a change being the cause for it. Maybe it's best if you run the different commands separately (make lint, make rpc-check, make unit, make itest) to see where things fail.

not sure why but I still get fails on make unit even though I have bitcoind installed in /usr/local/bin - always errors with unix sockets. I'm running ubuntu 22.04.

davidvennik@bravobravo:~$ which bitcoind
/usr/local/bin/bitcoind

...

bitcoind_test.go:203: unable to establish connection to bitcoind: unable to subscribe for zmq block events: dial unix //tmp/bitcoind871134295/blocks.socket: connect: no such file or directory

I know I have r/w access to /tmp so it isn't that.

The btcd tests are passing now though.

ok      github.com/lightningnetwork/lnd/lnwallet/test/btcd      (cached)

l0k18 avatar Jun 22 '22 08:06 l0k18

Did you install a version of bitcoind with ZMQ support enabled?

guggero avatar Jun 22 '22 08:06 guggero

Did you install a version of bitcoind with ZMQ support enabled?

oh, that's not default? I built it from source ofc. I'll fix that now.

Yes, it passes all in make unit now.

l0k18 avatar Jun 22 '22 09:06 l0k18

=== RUN   TestLightningNetworkDaemon/tranche00/97-of-100/btcd/3rd_party_anchor_spend
    test_harness.go:91: Failed: (3rd party anchor spend): exited with error: 
        *errors.errorString unable to generate block: rejected: block difficulty of 545259519 is not the expected value of 537395199
        /home/davidvennik/src/github.com/lightningnetwork/lnd/lntest/itest/lnd_onchain_test.go:533 (0xfea5b4)
                testAnchorThirdPartySpend: t.Fatalf("unable to generate block: %v", err)
        /home/davidvennik/src/github.com/lightningnetwork/lnd/lntest/itest/test_harness.go:114 (0xf91c5c)
                (*harnessTest).RunTestCase: testCase.test(h.lndHarness, h)
        /home/davidvennik/src/github.com/lightningnetwork/lnd/lntest/itest/lnd_test.go:245 (0x102c75a)
                TestLightningNetworkDaemon.func4: ht.RunTestCase(testCase)
        /home/davidvennik/go/src/testing/testing.go:1439 (0x512fa2)
                tRunner: fn(t)
        /home/davidvennik/go/src/runtime/asm_amd64.s:1571 (0x469381)
                goexit: BYTE    $0x90   // NOP

The logs on the CI did not show this. How would changes in the filesystem structure and a couple of name changes produce this error? I have done a lot of work with difficulty adjustment algorithms, this is an error with the miner's implementation.

l0k18 avatar Jun 23 '22 08:06 l0k18

As mentioned before:

Also, some of the unit and integration tests are a bit flaky, so they sometimes fail without a change being the cause for it.

guggero avatar Jun 23 '22 08:06 guggero

As mentioned before:

Also, some of the unit and integration tests are a bit flaky, so they sometimes fail without a change being the cause for it.

Since there is a bug here, I am thinking perhaps to follow this one up after this is done?

It looks, on the face of it, like a test without predefined timestamps.

l0k18 avatar Jun 23 '22 08:06 l0k18

There is now only one commit to examine, which might be a bit strange but it's a PR so I think this is ok, and better for reviewers.

One of the integration tests fails:

=== RUN   TestLightningNetworkDaemon/tranche00/01-of-105/btcd/test_multi-hop_htlc/committype=SCRIPT_ENFORCED_LEASE_zeroconf=false/local_force_close_immediate_expiry
    lnd_funding_test.go:759: 
        	Error Trace:	lnd_funding_test.go:759
        	            				lnd_multi-hop_test.go:264
        	            				lnd_multi-hop_htlc_local_timeout_test.go:31
        	            				lnd_multi-hop_test.go:142
        	Error:      	Received unexpected error:
        	            	rpc error: code = Unknown desc = /walletrpc.WalletKit/DeriveNextKey: unknown permissions required for method
        	Test:       	TestLightningNetworkDaemon/tranche00/01-of-105/btcd/test_multi-hop_htlc/committype=SCRIPT_ENFORCED_LEASE_zeroconf=false/local_force_close_immediate_expiry
=== CONT  TestLightningNetworkDaemon
    lnd_test.go:253: Failure time: 2022-07-27 12:45:01.401
--- FAIL: TestLightningNetworkDaemon (932.58s)

All else in make check now passing. Requesting review.

l0k18 avatar Jul 27 '22 11:07 l0k18

No idea the cause of test failures. On my machine make itest fails with some "unknown permissions" error on tranche00 and on the travis it fails in tranche02.

l0k18 avatar Jul 27 '22 11:07 l0k18

~~In the places where the separation of the server from the protobuf generated code requires package specifier prefixes sometimes the lines exceed 80 by up to 12 or so characters. More often only a few. I hope this is ok. I am trying to keep the changes visually minimal and it isn't stated I can't use dot imports but these would eliminate those changes completely, and changing the import name would be also less clear.~~

I am going to go back and enforce the 80 charater width limit. I read the guideline and it doesn't make an exception for "small numbers of characters exceeding 80 columns" and I hadn't set my tab width to 8 yet.

Note that in autopilotrpc there is a longer line but because it wasn't changed I didn't change it. There probably is others but I am keeping strict to this as I'm getting this finished.

I do need to comment that the formatting rules regarding function headers are not sufficiently exhaustive for possibilities, due to using short variable names and not being actual examples of 80 characters wide. It is literally impossible to avoid a return variable list sometimes starting with an open bracket previous line and using a comma to end the parameters and starting with the close bracket/open bracket are cases that are not covered. So I think this should be specified at some point. The long package names in this change made this issue come up constantly and thus there is several cases of terminal open brackets and beginning close brackets.

Here's an example:

func (s *Server) MuSig2CombineKeys(_ context.Context,
	in *signrpc.MuSig2CombineKeysRequest) (*signrpc.MuSig2CombineKeysResponse, error) {

The in parameter cannot go after the context parameter, and the first return parameter with in on the same line blows past 80 as well.

So I make it like this:

func (s *Server) MuSig2CombineKeys(_ context.Context,
	in *signrpc.MuSig2CombineKeysRequest,
) (*signrpc.MuSig2CombineKeysResponse, error) {

My suggestion is to specify that:

  • if the first parameter is over 80, open bracket
  • if the split is on the parameter/return brackets, to make the parameter close start on the new line (which is at zero column by gofmt).

This is the pattern I will adopt for the rest of this work.

Also, apologies for the several force pushes in this PR. I haven't got the most time efficient way to ensure compilation and most cases the force pushes in this PR have been from cases where package name changes have touched outside lnrpc/ and rpcservers/ folders.

l0k18 avatar Jul 28 '22 07:07 l0k18

Just a short note. Waiting on backing up my full node machine so I can set it up to run these very long make check runs. Once I see that thing pass I will request the second reviewer and try and get this thing finished. Thankyou all for your patience. This has been a good learning experience for me.

This was why I changed the walletkit name - it fails the make lint (why this linter is active or why the thing isn't fixed?):

rpcservers/router/router_backend.go:41:6: exported: type name will be used as router.RouterBackend by other packages, and that stutters; consider calling this Backend (revive)
type RouterBackend struct {
     ^
rpcservers/wallet/walletkit_server.go:186:6: exported: type name will be used as wallet.WalletKit by other packages, and that stutters; consider calling this Kit (revive)
type WalletKit struct {
     ^

None of the errors from make lint are changes in the commits in the foregoing. I don't know what to do with this.

l0k18 avatar Jul 28 '22 12:07 l0k18

Twice now encountered a glitch running the integration tests. Times out after one hour.

I'm not sure why it takes so long. I have 8 core ryzen 7 on my dev machine and ryzen 5 on my other, both have over 12gb or more. I am going to hax the timeout setting in the Makefile for the itests because it never is done with tranche00 before an hour on both machines.

Update

I learned it's possible to add this to the go test invocation with the flag -timeout with a standard format Go duration, so I am trying it with it set to two. Last run I made it managed to get to tranche72.

End result is this:

=== RUN   TestLightningNetworkDaemon/tranche00/98-of-105/btcd/3rd_party_anchor_spend
    test_harness.go:91: Failed: (3rd party anchor spend): exited with error: 
        *errors.errorString unable to generate block: rejected: block difficulty of 545259519 is not the expected value of 537001983
        /home/davidvennik/src/github.com/l0k18/lnd/lntest/itest/lnd_onchain_test.go:566 (0xfedcb4)
        	testAnchorThirdPartySpend: t.Fatalf("unable to generate block: %v", err)
        /home/davidvennik/src/github.com/l0k18/lnd/lntest/itest/test_harness.go:114 (0xf9471c)
        	(*harnessTest).RunTestCase: testCase.test(h.lndHarness, h)
        /home/davidvennik/src/github.com/l0k18/lnd/lntest/itest/lnd_test.go:245 (0x103423a)
        	TestLightningNetworkDaemon.func4: ht.RunTestCase(testCase)
        /home/davidvennik/go/src/testing/testing.go:1439 (0x513002)
        	tRunner: fn(t)
        /home/davidvennik/go/src/runtime/asm_amd64.s:1571 (0x469381)
        	goexit: BYTE	$0x90	// NOP

This fails consistently. But it's in the last little bit. Can't skip it easily to ensure nothing else fails but I'm gonna call it PASS anyway.

l0k18 avatar Jul 29 '22 18:07 l0k18

@bhandras: review reminder @guggero: review reminder

lightninglabs-deploy avatar Aug 05 '22 18:08 lightninglabs-deploy

What I forgot to add but is kind of the whole point of the PR is: We should make lnrpc its own go module and reference it in the main go.mod file. For now (and until we've become more familiar with Golang multi-module workspaces) we should probably add a local replace directive to avoid every PR needing to be split into two (which would be different from how we use all other go modules currently).

guggero avatar Aug 08 '22 10:08 guggero

So I should go mod init in lnrpc?

Putting a replace in the root go.mod isn't really that big a deal. Replace is a very useful feature.

l0k18 avatar Aug 09 '22 07:08 l0k18

What I forgot to add but is kind of the whole point of the PR is: We should make lnrpc its own go module and reference it in the main go.mod file. For now (and until we've become more familiar with Golang multi-module workspaces) we should probably add a local replace directive to avoid every PR needing to be split into two (which would be different from how we use all other go modules currently).

Upon doing this to test out what happens, go mod tidy seems to want to grab a heap of things from roasbeef forks of everything and trips on one of his modules that declares itself as the fork's path but it requires it as it's actual path.

I also tried copying over all the require blocks and replaces from the root, it makes no difference. Searching the entire source code I can't see where go mod is even getting these urls from. Furthermore, there is a couple of types that suddenly have errors, interfaces, that are fine before the split.

It definitely would be for a separate issue, this won't be a simple job. I'd even go so far as to say it should even be done as a separate repository to be clean.

l0k18 avatar Aug 10 '22 08:08 l0k18

@guggero I have made all requested changes. The creation of the new module out of lnrpc is not a simple task I don't see it fitting in the scope of this issue.

I'd just point out that importing isolated sections of this repository do not force any compilation outside of that package because it does not link outwards from it, and it is not such a big repository we are dumping a node_modules on people or anything.

Further, making it a separate module won't change that the entire repo will be pulled down from the git server either way, the only way to do that is to move the thing to a new repository. This could be done, but from what I already saw trying to just throw a go.mod in there I think it may be a lot more complicated due to types being pulled in from the rest of the repository. To cleanly separate all of it requires grouping all that lnrpc connects to that is within this repository into the lnrpc module, which I think may be either impractical or a massive breaking change.

The changes done in this PR will still achieve the goal of eliminating unnecessary compilation of code that is not related to the protobuf generated code.

Thinking further, the issue with lnrpc folder being separated out requires the lnrpc server code to be also separated. I will examine this possibility and perhaps push one more commit that exercises this change.

l0k18 avatar Aug 10 '22 08:08 l0k18

Initial exploration finds there is 57 files that are changed by this moving of the non-protobuf code from lnrpc folder. The Dockerfile has to be left where it is due to the fact it is actually just doing protobuf code generation. Almost all the changes are just import paths only.

I will tighten it up and push it and you can tell me if it will be kept.

l0k18 avatar Aug 10 '22 08:08 l0k18

Creating a new go module is the whole point of the original feature request. The point is not to avoid downloading things from Git but to massively reduce the linked code size for tools that only need the RPC definitions.

guggero avatar Aug 10 '22 08:08 guggero

Creating a new go module is the whole point of the original feature request. The point is not to avoid downloading things from Git but to massively reduce the linked code size for tools that only need the RPC definitions.

Ok, I am already started on it, and I just will say that because there is three separate APIs tied to the lnrpc package separating them between server and generated code will be a big change set. The new lnrpc server package will be called ln in the same fashion as the others and fortunately it is the far lesser of the usages compared to the generated code. So, just warning you to brace yourself for another 400 or so changes to finish this.

l0k18 avatar Aug 10 '22 08:08 l0k18

Wait, I don't think we're talking about the same thing. You moved the server code out of lnrpc so the package only contains the gRPC definitions. All that's left to do is to turn lnrpc (and only the root package) into a go module. No new name, it's just declared as a module. That shouldn't require any changes outside of that directory (except for maybe in the main go.mod). If there's a problem running go mod tidy then something's not right. Perhaps the rpc_utils.go and websocket_proxy.go files need to be moved to the rpcservers directory as well (which itself shouldn't be too big of a change). Also, go mod can be very tricky, sometimes you need to explicitly specify a version, otherwise it will fall back to the last non-tagged version (which for lnd is 0.2.0 which is ancient and is probably what's pulling in the weird forks). But the lnrpc module should have no dependency to the main lnd repo.

guggero avatar Aug 10 '22 09:08 guggero

Wait, I don't think we're talking about the same thing. You moved the server code out of lnrpc so the package only contains the gRPC definitions. All that's left to do is to turn lnrpc (and only the root package) into a go module. No new name, it's just declared as a module. That shouldn't require any changes outside of that directory (except for maybe in the main go.mod). If there's a problem running go mod tidy then something's not right. Perhaps the rpc_utils.go and websocket_proxy.go files need to be moved to the rpcservers directory as well (which itself shouldn't be too big of a change). Also, go mod can be very tricky, sometimes you need to explicitly specify a version, otherwise it will fall back to the last non-tagged version (which for lnd is 0.2.0 which is ancient and is probably what's pulling in the weird forks). But the lnrpc module should have no dependency to the main lnd repo.

I hadn't separated the lnrpc server parts from the protobuf generated code yet. The lnrpc package depends on types in the rest of the repository. Thus it should be moved the same as the rest of the server codes to isolate from the generated code.

I have pretty much already done this change now, just had a few lints to fix again because code moved to new places, I should be able to put a go.mod into the lnrpc directory now and because it has no imports from the rest of the repository it should be trivial. I will have the commit up soon if it was this simple.

Update:

Just a small glitch now to do with the interdependence between lncipb and walletrpc:

github.com/l0k18/lnd/lnrpc/lnclipb imports
        github.com/lightningnetwork/lnd/lnrpc/verrpc: module github.com/lightningnetwork/lnd@latest found (v0.0.2), but does not contain package github.com/lightningnetwork/lnd/lnrpc/verrpc
github.com/l0k18/lnd/lnrpc/walletrpc imports
        github.com/lightningnetwork/lnd/lnrpc/signrpc: module github.com/lightningnetwork/lnd@latest found (v0.0.2), but does not contain package github.com/lightningnetwork/lnd/lnrpc/signrpc

I think I can fix this with a replace directive pointing at the current module.

l0k18 avatar Aug 10 '22 09:08 l0k18

It required some changes in the main go.mod related to go 1.16 go module selection rules. Running make check on it now to be sure everything is behaving itself.

l0k18 avatar Aug 10 '22 09:08 l0k18

@guggero The latest relocates lnrpc server code to rpcserver/ln and creates a separate module for lnrpc and this has a replace directive to point at the local repo as you requested.

It also fails make check at

=== RUN   TestLightningNetworkDaemon/tranche00/98-of-106/btcd/3rd_party_anchor_spend
    test_harness.go:91: Failed: (3rd party anchor spend): exited with error: 
        *errors.errorString unable to generate block: rejected: block difficulty of 545259519 is not the expected value of 537001983
        /home/davidvennik/src/github.com/l0k18/lnd/lntest/itest/lnd_onchain_test.go:567 (0x1015e74)
        	testAnchorThirdPartySpend: t.Fatalf("unable to generate block: %v", err)
        /home/davidvennik/src/github.com/l0k18/lnd/lntest/itest/test_harness.go:114 (0xfbbe3c)
        	(*harnessTest).RunTestCase: testCase.test(h.lndHarness, h)
        /home/davidvennik/src/github.com/l0k18/lnd/lntest/itest/lnd_test.go:245 (0x105c3fa)
        	TestLightningNetworkDaemon.func4: ht.RunTestCase(testCase)
        /usr/lib/go-1.18/src/testing/testing.go:1439 (0x512ec2)
        	tRunner: fn(t)
        /usr/lib/go-1.18/src/runtime/asm_amd64.s:1571 (0x469321)
        	goexit: BYTE	$0x90	// NOP

which is definitely a test related error.

I can't figure out what github is referring to with the merge conflict. This file is moved, and my git client doesn't say there is anything to be done.

When I was doing the changes I didn't think that they would amount to more than 500 changes after I was able to use import aliases to remove a lot where only server code was imported. Sorry, that is just what it took to do it.

Current state has a few linter fixes in the midst of this last commit. I will separate them now and re-force-push the separated commits

l0k18 avatar Aug 10 '22 12:08 l0k18