taproot-assets
taproot-assets copied to clipboard
Block RPC calls if not properly started or errored out during start
In a recent user report, litd dysfunction was shown because the universe RPC call ran before the universe sub service was online.
See TODO summary here: https://github.com/lightninglabs/taproot-assets/issues/1122#issuecomment-2345928938
Thanks for the issue report! Should we allow any rpc requests to the tapd before the universe sub-service has been started?
If they should be separated, so that we would want allow some requests prior to the universe sub-service, would an ok first version be to just disallow any tapd RPCs until the universe sub-service is up, and then create the separation at a later stage?
I don't think they should be separated, or that there would be any benefit to that.
IIUC the desired fix is to disallow all tapd RPCs until all tapd sub-services are up.
LiT today should already take care of this. We only mark a subserver as "ready for calls" once the Start (for tap, this is the (s *Server) StartAsSubserver method) method of that subserver returns without an error. We want to keep lit as generic as possible in terms of handing of subservers and so it is expected for each subserver that it is ready to handle calls once that start method has returned and we really should not be doing extra calls to check if various subserver specifics are ready - the SLA here is that things should be ready once that method has returned.
If yall dont want to permanently change the behaviour, then I suggest adding a "BlockTillUniverseStart" functional option or something on that call
Hmm, I see. That is the correct behavior as far as I can see. We only have a screenshot of a stack trace to go from and it looked like an RPC request did come through before the internal gRPC server in tapd was started.
But perhaps tapd stopped with an error and the request came through after the gRPC server was stopped?
Though from just the code it looks like that's handled correctly as well.
Need more info from the user then, preferably actual logs.
Reposting the stack trace as text:
2024-08-29 11:27:64.591 [DBG] RPCS: [/Lnrpc.State/GetState] requested
2024-08-29 11:27:64.682 [DBG] RPCS: [/Lnrpc.State/GetState] requested
2024-08-29 11:27:04.690 [DBG] RPCS: [/universerpc.Universe/ListFederationServers] requested
2024-08-29 11:27:64.692 [DBG] RPCS: [/Lnrpc.Lightning/ChecktacaroonPermissions] requested
2024-08-29 11:27:64.692 [DBG] RPCS: [/Lnrpc.Lightning/ListPeers] requested
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=@x1 addr=0x38 pc=9x16c62a0]
goroutine 3224 [running]:
github. com/lightninglabs/taproot -assets. (*rpcServer).ListFederationServers(@x0?, {@x32b0ad0?, 0x40026efc50?}, @x1bb10407)
github.con/Lightninglabs/[email protected]/rpcserver .g0:5654 +8x20
github. com/lightninglabs/taproot -assets/taprpc/universerpc._Universe_ListFederationServers_Handler .func1({@x3Zb0ad0?, @x40026efc50?}, {@x1bb1040?,
github.con/Lightninglabs/taproot [email protected] -2110839696cb/taprpc/universerpc/universe_grpc.pb.90:688 +0xd0
github. com/lightningnetwork/Lnd/rpcperms.(*InterceptorChain) .CreateServerOpts. (*InterceptorChain) .middLewareUnaryServer Interceptor . func7({@x3zb0adé
github. com/lightningnetwork/[email protected] 1b353b0bfd58/rpcperms/interceptor .go:832 +0xec
‘google.golang.org/grpc.getChainunaryHandler .funct({9x32bGad0, @x40026ef C50}, {@x1bb1049, 6x49026eFcB0})
google. golang.org/[email protected]/server.go:1163 +0xa0
github. com/lightningnetwork/Lnd/rpcperms. (*InterceptorChain) .CreateServerOpts. (*InterceptorChain) .MacaroonUnaryServer Interceptor . func5({@x3Zb0ad0,
github. com/lightningnetwork/Lnd@[email protected] 1b353b0bfd58/rpcperms/interceptor .go:689 +0x90
google. golang.org/grpc.getChainUnaryHandLer . func1({@x32b0ad0, @x40026efc50}, {@x1bb1040, @x40026efc80})
google.golang.org/[email protected]/server.go:1163 +8xa0
github. com/lightningnetwork/Lnd/rpcperms. (*InterceptorChain) .CreateServerOpts. (*InterceptorChain) .rpcStateUnaryServer Interceptor . Func3({@x3Zb0ad0,
github. com/lightningnetwork/Lnd@vd. 18.0-beta.rc4.0.20240730143253-1b353bebfdS8/rpcperms/interceptor .go:781 +6x108
google. golang.org/grpc.getChainUnaryHandLer . func1({@x32b0ad0, @x40026efc50}, {@x1bb1040, @x40026efc80})
google.golang.org/[email protected]/server.go:1163 +8xa0
github. com/lightningnetwork/Lnd/rpcperms.(*InterceptorChain) .CreateServer0pts.errorLogunaryServer Interceptor . funcl({@x32b0ad0?, @x40026efc50?}, {0:
github. com/lightningnetwork/Lnd@v@. 18.0-beta.rc4.0.20240730143253- 1b353b0bfd58/rpcperms/interceptor .go:605 +0x48
google.golang.org/grpc.NewServer .chatnUnaryServerInterceptors.chainUnaryinterceptors.funci({@x32b0ad®, @x49026efC50}, {@x1bb1040, @x49026eFc8},
‘google.golang.org/[email protected]/server .go:1154 +0x88
github. com/lightninglabs/taproot -assets/taprpc/universerpc._Universe_ListFederationServers_Handler({@xleSe5a0, 0x400056e600}, {@x32b0ad0, 0x40026et
github.con/Lightninglabs/taproot [email protected] -2110839696cb/taprpc/untverserpc/universe_grpc.pb.g0:690 +9x148
‘google.golang.org/grpc. (*Server).processUnaryRPC(Gx40008541e9, {Gx32bGad0, @x49026eF99}, {6x32c2920, 0x4000588340}, 0x40026b19e0, 6x4900591c20,
google. golang.org/[email protected]/server .go:1343 +0xb40
google.golang.org/grpc. (*Server).handleStream(@x40008541e0, {0x32c2920, @x4000588340}, @x40026b19e9)
‘google. golang.org/[email protected]/server .go:1737 +8x95c
google. golang.org/grpc. (*Server).serveStreams. func1.1()
google. golang.org/[email protected]/server.go:986 +0x88
created by google.golang.org/grpc.(*Server).serveStreams.funcl in goroutine 3223
google. golang.org/[email protected]/server.go:997 +0x14c
So today we have a middleware interceptor that'll bounce all calls until the server is fully started: https://github.com/lightninglabs/taproot-assets/blob/72b93f84a0afa08e01c99d582357672133fc9b20/rpcperms/interceptor.go#L380-L408
We then set to active after all the sub-systems have started here: https://github.com/lightninglabs/taproot-assets/blob/72b93f84a0afa08e01c99d582357672133fc9b20/server.go#L374-L378
From the trace, either the rpcserver or FederationDB was nil at that point, which is puzzling (all the sub-system structs have already been initialized at that point). So perhaps some mutation occurred somewhere?
This issue is coming from our team. When the terminal is restarted while lnd and tapd are still in the process of syncing, the terminal crashes continuously due to receiving various requests from tapd (such as assets list, subscribeRfqEvents, universe).
@lukegao209 could you send us a full log (including the stack trace) from the beginning of a startup sequence until it panics?
@lukegao209 also, what port are you issuing the ListFederationServers call on? The main litd port (8443) or the integrated tapd port (10029)?
@lukegao209 also, what port are you issuing the
ListFederationServerscall on? The mainlitdport (8443) or the integratedtapdport (10029)?
port 10029
@lukegao209 - ok cool thanks 🙏 This makes sense then - rather point your requests to Lit's port 8443 as then LiT will block calls to tapd until it is ready.
@Roasbeef re
So today we have a middleware interceptor that'll bounce all calls until the server is fully started:
This is actually not the case for the Subserver startup of Tap.
- this is done for the main startup of Tap when it is in its own process: see
RunUntilShutdownwhich sets up the interceptor chain for Tap which does this blocking - but for
StartAsSubserverwhich LiT uses , the interceptor chain in Tap is not set up
In conclusion, for the terminal, it should request lnd on port 10009; for other services (tapd, loop, pool, etc.), they should all request port 8443. ?
you can also point LND requests to 8443 👍
thanks
btw , will terminal’s account support taproot assets channel?
btw , will terminal’s account support taproot assets channel?
not out of the box, no. we'll need to update the litd account system once taproot asset channels are fully implemented.
If Taproot Assets are implemented through parsing the invoice’s custom_data, maybe I can start working on supporting it now.
@ellemouton
but for StartAsSubserver which LiT uses , the interceptor chain in Tap is not set up
Gotcha, ok that seems to be the core issue here. We should retain the interceptor chain for tapd, either using the same system, or a more explicit check.
@guggero - with our latest offline discussion, do you rate we can close this and move it to the Tap repo?
Yes. TODOs are:
- tapd: return false in waitForReady if tapd was shut down or errored out
- tapd: don't set initialized=true when startup (StartAsSubserver) fails
- tapd: add server nil check to global error interceptors
Transferring to tapd repo.