dcrdex
dcrdex copied to clipboard
candles: investigate cache sync delays
Similar to https://github.com/decred/dcrdex/issues/2168, much less critical, but I'd like to investigate it further still.
In https://github.com/decred/dcrdex/pull/2195#issuecomment-1451186456 I described what looks like "a noticiable delay in candle cache sync (between dexc cache and server data)",
it is very noticeable for me now (laptop lid was closed for ~1h with dexc running), it didn't auto-recover (page refresh doesn't help) for ~40 mins:

observed on 6a93d24
, but I do plan to reproduce it with https://github.com/decred/dcrdex/pull/2195#issuecomment-1451336657 changes, in case it affects anything.
full log: candles-gap-lid-closed.txt
Relevant part of the log for the moment when "auto-recovery" happened:
2023-03-02 05:55:15.588 [TRC] CORE[dcr_btc][book]: Validating match proof note for epoch 167773661 (dcr_btc) with 1 preimages and 0 misses.
2023-03-02 05:55:38.429 [TRC] CORE[dcr_btc][book]: Validating match proof note for epoch 167773663 (dcr_btc) with 3 preimages and 0 misses.
2023-03-02 05:56:15.587 [TRC] CORE[dcr_btc][book]: Validating match proof note for epoch 167773667 (dcr_btc) with 1 preimages and 0 misses.
2023/03/02 09:01:05 "GET http://127.0.0.1:5758/orders HTTP/1.1" from 127.0.0.1:59638 - 200 11660B in 5.055102ms
2023-03-02 06:01:05.205 [TRC] WEB[WS]: message of type 1 received for route unmarket
2023/03/02 09:01:05 "POST http://127.0.0.1:5758/api/orders HTTP/1.1" from 127.0.0.1:59638 - 200 24B in 436.669µs
2023/03/02 09:01:07 "GET http://127.0.0.1:5758/settings HTTP/1.1" from 127.0.0.1:59638 - 200 25760B in 1.749868ms
2023-03-02 06:01:55.613 [TRC] CORE[dcr_btc][book]: Validating match proof note for epoch 167773701 (dcr_btc) with 1 preimages and 0 misses.
2023-03-02 06:02:05.206 [DBG] CORE: Unsubscribing from the dcr_btc order book for dex.decred.org:7232
2023-03-02 06:02:05.456 [ERR] CORE: Route 'match_proof' notification handler error (DEX dex.decred.org:7232): no order book found with market id "dcr_btc"
2023-03-02 06:02:05.456 [ERR] CORE: Route 'epoch_report' notification handler error (DEX dex.decred.org:7232): no order book found with market id 'dcr_btc'
2023/03/02 09:02:45 "GET http://127.0.0.1:5758/markets HTTP/1.1" from 127.0.0.1:59654 - 200 49293B in 1.81263ms
2023-03-02 06:02:45.353 [TRC] WEB[WS]: message of type 1 received for route loadmarket
2023-03-02 06:02:45.353 [DBG] CORE: Subscribing to the dcr_btc order book for dex.decred.org:7232
2023-03-02 06:02:45.945 [DBG] CORE[dcr_btc][book]: Processing 0 cached order notes
2023-03-02 06:02:45.945 [TRC] WEB[WS]: message of type 1 received for route loadcandles
2023-03-02 06:03:35.714 [TRC] CORE[dcr_btc][book]: Validating match proof note for epoch 167773711 (dcr_btc) with 7 preimages and 0 misses.
2023-03-02 06:03:45.771 [TRC] CORE[dcr_btc][book]: Validating match proof note for epoch 167773712 (dcr_btc) with 6 preimages and 0 misses.
Can this happen in a way other than closing a laptop lid? Is this a special case? I suppose this means the computer goes into a type of sleep mode? Does the sleep mode cause orders to fail I guess?
It would make sense, if the app had no knowledge of the passage of time, that it would start adding candles after the suspension like nothing happened.
Regarding a comment here:
computer was suspended
can we define "suspended" ? I think OS can suspend any running process whenever it wants for pretty much however long it wants (in practice it's probably very rare, still I don't think it is a thing of the past),
and does dexc
rely on user not closing laptop lid while using dex ? If he has some trades placed he obviously shouldn't do it, but he might simply forget and do it anyway.
Got more interesting pattern with not just 1 but 4 gaps (running with https://github.com/norwnd/dcrdex/pull/4, ~~not sure if it's related~~ probably not because I haven't seen notifications from past/future in the logs for this run, gonna be investigating these gaps further soon):

And closing laptop lid isn't the only way I've seen this occur even with https://github.com/decred/dcrdex/pull/2195 merged, btw. Left it running once and went away for an hour or so, I guess it still went to sleep mode.
I think this is what happens:
- Candle cache is initialised only once when
bookie
gets created, reconnect doesn't reset it (reconnect does reset order book though, issuing newsubscribe
request). Thebookie
might or might not get cleaned up ifdexc
is suspended for a while, e.g. if bookie managed to create new feed before hisbookie.closeTimer
fired and completed its job (that happens becausews
connection break after long suspension, correspondingbookie
feed is removed, and if there are no other feedsbookie
should be cleaned up with ~1m delay) - bookie stays, and even thoughbookie
missed plenty of candle update notifications (while laptop was at sleep) he does not try to resync candle cache. So, it does seem like we need to reset candle cache inhandleReconnect
similar to how we reset order book there, thoughts ? - Occasionally we are missing candle updates (
epoch_report
notification) when downloading/applying server snapshot. For candle cache such atomicity property isn't supported at the moment, that's both neither on client nor on server side (so it's 2 different places to adjust, if atomicity is desirable). It isn't as noticeable for candles as it is for orders (where lack of similar atomicity property would result in ghost orders like I mentioned elsewhere), still without this candle chart doesn't reflect what actually happened on dex server with 100% accuracy in real time (when cache resets, e.g. on dexc restart, this discrepancy is resolved). Just documented this in https://github.com/norwnd/dcrdex/pull/10 for now, seems like isn't worth bothering solving any time soon.
That doesn't quite explain how I was able to get those 3 mini-candles in the middle of the gap (unless my mac just woke up 3 times received some and went back to sleep, re-establishing broken WS connection with UI to keep bookie
around - I don't have logs from that run, but will try to reproduce again).
unless my mac just woke up 3 times received some and went back to sleep, re-establishing broken WS connection with UI to keep bookie around - I don't have logs from that run, but will try to reproduce again
Yeah, it seems to be the reason. Did another run:

full log: candle-gap-awaking.txt
and just the relevant section corresponding to candle selected on the chart above (at ~19:30 UTC+0):
2023-03-10 17:28:30.691 [TRC] CORE[dcr_btc][book]: Validating match proof note for epoch 167846941 (dcr_btc) with 1 preimages and 0 misses.
2023-03-10 17:28:33.438 [TRC] CORE[eth][RPC]: "mainnet.infura.io" reported new tip at height 16799208 (0xb42ecd83456415f5dd6713da6978a521b7f96b888b1fcfdff0d84a604376a668)
2023-03-10 17:28:33.722 [DBG] CORE[eth]: tip change: 16799207 (0x28e504d9e5dbf92ce9b89e869051f6146220a234576dd447b6a1b8b4a3963228) => 16799208 (0xb42ecd83456415f5dd6713da6978a521b7f96b888b1fcfdff0d84a604376a668)
2023-03-10 17:28:33.722 [TRC] CORE: Processing tip change for usdc.eth
2023-03-10 17:28:33.722 [TRC] CORE: Processing tip change for eth
2023-03-10 17:28:39.211 [TRC] CORE[eth][RPC]: "mainnet.infura.io" reported new tip at height 16799209 (0xebbae66be77e99bd8811d52eaa9bfae25599a4e08a293262ed4bf99c2c0aa971)
2023-03-10 17:28:39.722 [DBG] CORE[eth]: tip change: 16799208 (0xb42ecd83456415f5dd6713da6978a521b7f96b888b1fcfdff0d84a604376a668) => 16799209 (0xebbae66be77e99bd8811d52eaa9bfae25599a4e08a293262ed4bf99c2c0aa971)
2023-03-10 17:28:39.722 [TRC] CORE: Processing tip change for usdc.eth
2023-03-10 17:28:39.722 [TRC] CORE: Processing tip change for eth
2023-03-10 17:28:40.078 [TRC] CORE[dcr_btc][book]: Validating match proof note for epoch 167846942 (dcr_btc) with 1 preimages and 0 misses.
2023-03-10 19:28:38.677 [ERR] CORE[wss://dex.decred.org:7232/ws]: Read timeout on connection to wss://dex.decred.org:7232/ws.
2023-03-10 19:28:38.677 [DBG] CORE: notify: |POKE| (conn) Server disconnect - dex.decred.org:7232 is disconnected
2023-03-10 19:28:38.677 [INF] CORE[wss://dex.decred.org:7232/ws]: Attempting to reconnect to wss://dex.decred.org:7232/ws...
2023-03-10 19:28:39.832 [ERR] CORE[eth][RPC]: "mainnet.infura.io" header subscription error: read tcp 192.168.1.10:62978->52.86.200.55:443: read: connection reset by peer
2023-03-10 19:28:39.832 [INF] CORE[eth][RPC]: Attempting to resubscribe to "mainnet.infura.io" block headers
2023-03-10 19:28:40.648 [DBG] CORE: notify: |POKE| (conn) Server connected - dex.decred.org:7232 is connected
2023-03-10 19:28:40.648 [INF] CORE[wss://dex.decred.org:7232/ws]: Successfully reconnected.
2023/03/10 22:28:40 "GET http://127.0.0.1:5758/api/user HTTP/1.1" from 127.0.0.1:63188 - 200 49556B in 49.510195ms
2023/03/10 22:28:40 "GET http://127.0.0.1:5758/markets HTTP/1.1" from 127.0.0.1:63188 - 200 49343B in 42.473164ms
2023-03-10 19:28:40.755 [TRC] WEB[WS]: message of type 1 received for route unmarket
2023-03-10 19:28:40.766 [TRC] WEB[WS]: message of type 1 received for route loadmarket
2023-03-10 19:28:40.766 [TRC] WEB[WS]: message of type 1 received for route loadcandles
2023-03-10 19:28:41.277 [INF] CORE: Server dex.decred.org:7232 supports API version 0.
2023-03-10 19:28:41.277 [DBG] CORE: Server dex.decred.org:7232 broadcast timeout 22m0s. Tick interval 2m45s
2023-03-10 19:28:41.277 [DBG] CORE: Subscribing to the dcr_btc order book for dex.decred.org:7232
2023-03-10 19:28:41.613 [DBG] CORE[dcr_btc][book]: Processing 0 cached order notes
2023-03-10 20:40:58.285 [TRC] CORE[eth][RPC]: Using cached header from "mainnet.infura.io"
2023-03-10 20:40:58.441 [ERR] CORE: Route 'match_proof' notification handler error (DEX dex.decred.org:7232): match proof validation failed: epoch 167847663 match proof note references 3 orders, but local epoch queue is empty
2023-03-10 20:41:01.247 [ERR] CORE[eth][RPC]: "mainnet.infura.io" header subscription error: read tcp 192.168.1.10:63185->52.7.204.56:443: read: connection reset by peer
2023-03-10 20:41:01.249 [INF] CORE[eth][RPC]: Attempting to resubscribe to "mainnet.infura.io" block headers
2023-03-10 20:41:03.845 [ERR] CORE[wss://dex.decred.org:7232/ws]: Read timeout on connection to wss://dex.decred.org:7232/ws.
2023-03-10 20:41:03.849 [DBG] CORE: notify: |POKE| (conn) Server disconnect - dex.decred.org:7232 is disconnected
2023-03-10 20:41:03.849 [INF] CORE[wss://dex.decred.org:7232/ws]: Attempting to reconnect to wss://dex.decred.org:7232/ws...
2023-03-10 20:41:08.077 [DBG] CORE: notify: |POKE| (conn) Server connected - dex.decred.org:7232 is connected
2023-03-10 20:41:08.089 [INF] CORE[wss://dex.decred.org:7232/ws]: Successfully reconnected.
2023-03-10 22:41:11.754 [INF] CORE: Server dex.decred.org:7232 supports API version 0.
2023-03-10 22:41:11.760 [DBG] CORE: Server dex.decred.org:7232 broadcast timeout 22m0s. Tick interval 2m45s
2023-03-10 22:41:11.762 [DBG] CORE: Subscribing to the dcr_btc order book for dex.decred.org:7232
2023/03/11 01:41:12 "GET http://127.0.0.1:5758/api/user HTTP/1.1" from 127.0.0.1:63196 - 200 49555B in 4.600736497s
2023/03/11 01:41:12 "GET http://127.0.0.1:5758/markets HTTP/1.1" from 127.0.0.1:63196 - 200 49343B in 198.63259ms
2023-03-10 22:41:12.468 [TRC] WEB[WS]: message of type 1 received for route unmarket
2023-03-10 22:41:12.550 [TRC] WEB[WS]: message of type 1 received for route loadmarket
2023-03-10 22:41:12.552 [TRC] WEB[WS]: message of type 1 received for route loadcandles
2023-03-10 22:41:20.454 [ERR] CORE[eth][RPC]: "mainnet.infura.io" header subscription error: read tcp 192.168.1.10:63192->3.222.72.40:443: read: connection reset by peer
2023-03-10 22:41:20.455 [INF] CORE[eth][RPC]: Attempting to resubscribe to "mainnet.infura.io" block headers
2023-03-11 00:41:25.587 [TRC] CORE[eth][RPC]: Fetching fresh header from "mainnet.infura.io"
2023-03-11 00:41:26.426 [WRN] CORE[eth][RPC]: can't resubscribe to "read tcp 192.168.1.10:63202->34.197.232.84:443: read: connection reset by peer" headers: %!v(MISSING)
2023-03-11 00:41:26.933 [ERR] CORE[wss://dex.decred.org:7232/ws]: Read timeout on connection to wss://dex.decred.org:7232/ws.
2023-03-11 00:41:26.938 [DBG] CORE: notify: |POKE| (conn) Server disconnect - dex.decred.org:7232 is disconnected
2023-03-11 00:41:26.938 [INF] CORE[wss://dex.decred.org:7232/ws]: Attempting to reconnect to wss://dex.decred.org:7232/ws...
2023-03-11 00:41:27.523 [TRC] CORE[eth][RPC]: all providers in a failed state, so acting like none are
when laptop lid is closed, it does wake up occasionally, browser reestablishes (if it broke) new WS connection (that keeps bookie around) and dexc reconnects to server and even gets some notifications from it (that gives us those lone candles!) before laptop goes to sleep again.