eclair
eclair copied to clipboard
Improve router performance
This PR parallelizes route calculations. It creates a bunch of worker actors that perform route calculation in parallel.
As a side effect it prevents the Router's mailbox from clogging with route requests, so it can process network updates on time.
See https://github.com/ACINQ/eclair/discussions/2644 for the context.
Codecov Report
Attention: 5 lines in your changes are missing coverage. Please review.
Comparison is base (
c37df26) 85.85% compared to head (86ffd1a) 85.92%. Report is 1 commits behind head on master.
:exclamation: Your organization needs to install the Codecov GitHub app to enable full functionality.
Additional details and impacted files
@@ Coverage Diff @@
## master #2651 +/- ##
==========================================
+ Coverage 85.85% 85.92% +0.07%
==========================================
Files 216 216
Lines 18201 18221 +20
Branches 749 789 +40
==========================================
+ Hits 15626 15656 +30
+ Misses 2575 2565 -10
| Files | Coverage Δ | |
|---|---|---|
| ...re/src/main/scala/fr/acinq/eclair/NodeParams.scala | 93.41% <100.00%> (+0.02%) |
:arrow_up: |
| ...src/main/scala/fr/acinq/eclair/crypto/Sphinx.scala | 100.00% <100.00%> (ø) |
|
| .../scala/fr/acinq/eclair/payment/PaymentPacket.scala | 90.90% <100.00%> (+0.08%) |
:arrow_up: |
| ...scala/fr/acinq/eclair/payment/send/Recipient.scala | 98.52% <100.00%> (ø) |
|
| ...cinq/eclair/remote/EclairInternalsSerializer.scala | 97.82% <100.00%> (ø) |
|
| ...a/fr/acinq/eclair/wire/protocol/OnionRouting.scala | 80.00% <100.00%> (+13.33%) |
:arrow_up: |
| ...a/fr/acinq/eclair/wire/protocol/PaymentOnion.scala | 99.27% <100.00%> (-0.02%) |
:arrow_down: |
| ...src/main/scala/fr/acinq/eclair/router/Router.scala | 94.42% <95.34%> (+0.02%) |
:arrow_up: |
| ...cala/fr/acinq/eclair/router/RouteCalculation.scala | 94.50% <70.00%> (-0.06%) |
:arrow_down: |
I think this is a useful change, but I dislike the
volatilewrapper.
That’s a simple way to avoid Memory Consistency Errors in JVM. And it does the job, since all updates are made in one thread.
IMHO the real overkill is that Router unnecessarily implements FSM, whereas it’s not a finite state machine at all, there’s only one possible state NORMAL, so there are no state transitions, naturally.
My goal was to make the least invasive change, so I introduced stayUsing() method, which is in fact dangerous, because someone could unknowingly use stay() using ??? instead, and screw up the workers state.
I’d prefer to drop FSM support for Router altogether, and to make it a plain vanilla actor, drop any kind of stay() method calls, and stick to persistent data structures. In this case I would do something like this:
class Router(val nodeParams: NodeParams, watcher: typed.ActorRef[ZmqWatcher.Command], initialized: Option[Promise[Done]] = None) extends Actor {
private val data = VolatileRouterData()
def receive =
…
case PeerRoutingMessage(peerConnection, remoteNodeId, c: ChannelAnnouncement) =>
data.update(
= Validation.handleChannelAnnouncement(data, watcher, RemoteGossip(peerConnection, remoteNodeId), c))
…
case r: RouteRequest =>
worker ! Worker.FindRoutes(data, r, sender())
…
}
}
It’s not really Haskell style, but easy to understand, easy to maintain, and there’s absolutely no Java concurrent programming fun.
We should also probably motivate this change with benchmarks, like any non-trivial change made solely for performance gains.
You can find some benchmarks here https://github.com/ACINQ/eclair/discussions/2644