dcrdex icon indicating copy to clipboard operation
dcrdex copied to clipboard

bookie: handle feed closure correctly (mm bot and everywhere else)

Open norwnd opened this issue 1 year ago • 3 comments

I've noticed MM bot occasionally would just stop working without any errors in logs (I think this also was reported on Matrix a while ago).

I've been digging through basic MM bot code lately and it seems there is no handling for server disconnect, the code involved is quite complex to tell what would happen exactly - but from cursory overview it seems 2 outcomes are possible:

  1. once there are no update coming on the bookFeed bot will be doing nothing (won't be adjusting existing orders or placing new ones)
  2. and once the bookFeed channel actually closes the involved for loop will hit nil pointer dereference due to nil value returned immediately from the channel

^ I didn't observe nil pointer happen in practice (case 2) likely because no explicit bookie.closeFeeds() call was issued (either bookie is intentionally "kept alive" perhaps to serve other users like UI, or server connection just broke at some low level and didn't bother to notify higher levels about it).

Seems with the current implementation we are experiencing case 1 - which means botLoop doesn't really know when it needs to ditch the current bookFeed he is consuming and create a new one with m.core.SyncBook (unless we can fix the propagation of feed closing from lower to higher levels, a work-around could be to timeout "dead" feed and re-subscribe to a new one).

Note, even if bookie is intentionally "kept alive" though another consumer (of another feed) in the scenarios described in this issue it probably shouldn't be because bookie is just a multiplexer(either all of his feeds break/stop or all of them must be functional), hence the proper solution (as opposed to a workaround I've suggested above) would probably look like this:

  • fix connection termination error propagation such that we convert "case 1" into "case 2" (consistently start hitting those nil pointer dereferences instead of just hanging waiting for new updates that will never come)
  • fix "case 2" by additionally checking whether bookFeed has been closed (and re-creating new feed when necessary)

norwnd avatar Nov 24 '24 12:11 norwnd

If the server disconnects and reconnects, the old feed should still be working. Are you running the latest version? If you go to the orders report popup (click the button next to "All Buy Orders Placed Successfully"), does the epoch number get updated? Screenshot 2024-12-01 at 1 45 39 PM

martonp avatar Dec 01 '24 12:12 martonp

Are you running the latest version? If you go to the orders report popup (click the button next to "All Buy Orders Placed Successfully"), does the epoch number get updated?

I'll take a better/deeper look at this a bit later (busy with other stuff at the moment) and let you know what I found out, perhaps what I saw was indeed caused by something else.

norwnd avatar Dec 01 '24 12:12 norwnd

@martonp so I've looked into this once again

  1. and once the bookFeed channel actually closes the involved for loop will hit nil pointer dereference due to nil value returned immediately from the channel

this issue is real, although pretty hard to run into - I think mostly because consumers close their own feeds after they are done with <-bookFeed.Next() channel (otherwise we'd get more user-reports on it already),

running my own version of the code - https://github.com/norwnd/bison-lean/blob/562f6b132375f68377fbabda57c6099de665088c/client/mm/mm_basic.go#L576-L576

2025-02-08 18:09:15.328 [INF] CORE: Disabling dex server with unspent bonds. Bonds will be refunded when expired.
2025-02-08 18:09:15.340 [DBG] CORE: Unsubscribing from the btc_usdt.polygon order book for dex.decred.org:7232
2025-02-08 18:09:15.340 [DBG] CORE: Unsubscribing from the dcr_usdt.polygon order book for dex.decred.org:7232
2025-02-08 18:09:15.340 [DBG] CORE: Unsubscribing from the dcr_usdc.polygon order book for dex.decred.org:7232
2025-02-08 18:09:15.340 [DBG] CORE: Unsubscribing from the ltc_usdt.polygon order book for dex.decred.org:7232
2025-02-08 18:09:15.340 [DBG] CORE: Unsubscribing from the dcr_polygon order book for dex.decred.org:7232
2025-02-08 18:09:15.340 [DBG] CORE: Unsubscribing from the usdc.polygon_usdt.polygon order book for dex.decred.org:7232
2025-02-08 18:09:15.340 [DBG] CORE: Unsubscribing from the dcr_btc order book for dex.decred.org:7232
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x2 addr=0x0 pc=0x1015d922c]

goroutine 47749 [running]:
decred.org/dcrdex/client/mm.(*basicMarketMaker).botLoop.func1()
	/Users/norwnd/dcrdex/client/mm/mm_basic.go:576 +0x16c
created by decred.org/dcrdex/client/mm.(*basicMarketMaker).botLoop in goroutine 46704
	/Users/norwnd/dcrdex/client/mm/mm_basic.go:567 +0x26c
make[1]: *** [run] Error 2
make: *** [all] Error 2

which probably means <-bookFeed.Next() channel shouldn't be read like it currently is in a bunch of places, and maybe instead we can read ok value to additionally check if the feed hasn't been closed by somebody else (the way I've reproduced this is by using "Disable Account" in UI Settings, and from what I understand it's not the only way to trigger it)

  1. once there are no update coming on the bookFeed bot will be doing nothing (won't be adjusting existing orders or placing new ones)

regarding this ^ concern

you are right about client <-> server WS connection being automatically reestablished (so bookie itself doesn't have to worry or even know about it)


The MM bot freeze I originally observed might have been due to wallet unable to allocate funds (although I probably would have noticed in logs), or maybe by https://github.com/decred/dcrdex/issues/3180


I'll rename this issue accordingly - "bookie: handle feed closure correctly (mm bot and elsewhere)"

for the case of MM bot specifically, I guess it would be fine to just terminate botLoop in case feed is closed (if it's much harder to implement loop restart)

norwnd avatar Feb 08 '25 18:02 norwnd

This issue is hard to follow. I have not run into any problems with the book feed an I have the worst internet. What is the correct feed closure?

JoeGruffins avatar Apr 22 '25 07:04 JoeGruffins

To sum up, <-bookFeed.Next() might return nil in certain situations (in basicMarketMaker.botLoop in particular - so my recommendation on how to address it would be:

  • either to make sure that nil is handled on the receiver side (everywhere bookFeed.Next() is used)
  • or maybe rework bookFeed itself somehow so it doesn't send nil (never closes), but haven't explored this direction enough to undertand if that's a viable option

norwnd avatar Apr 22 '25 09:04 norwnd

oh, I think you are right, in mm

JoeGruffins avatar Apr 23 '25 04:04 JoeGruffins

@norwnd please have a look at #3254

JoeGruffins avatar Apr 23 '25 05:04 JoeGruffins