gocryptotrader icon indicating copy to clipboard operation
gocryptotrader copied to clipboard

Websocket: Various Refactors and Tests

Open gbjk opened this issue 1 year ago β€’ 6 comments

  • Websocket: Remove IsInit IsInit was basically the same as IsConnected. Any time Connect was called both would be set to true. Any time we had a disconnect they'd both be set to false Shutdown() incorrectly didn't setInit(false)
  • SetProxyAddress simplified to only reconnect a connected Websocket. Any other state means it hasn't been Connected, or it's about to reconnect anyway. There was no handling for IsConnecting previously, either, so I've wrapped SetProxyAddress behind the main mutex.
  • Websocket: Improve and simplify tests
  • Simplify state (init, disc, connecting, connected) to a single state field
  • Move WebsocketNotEnabled to a real ErrWebsocketNotEnabled

This PR started because of inconsistencies noticed with IsInit and state from a okx panic bug. It kinda broke out containment at that point and went native. KSozThx.

Type of change

  • [x] Refactoring (non-breaking change)

How has this been tested

  • [x] go test ./... -race
  • [x] golangci-lint run

gbjk avatar Feb 04 '24 04:02 gbjk

Codecov Report

Attention: Patch coverage is 70.44025% with 47 lines in your changes are missing coverage. Please review.

Project coverage is 35.89%. Comparing base (40193bb) to head (f9ea569).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1466      +/-   ##
==========================================
- Coverage   37.79%   35.89%   -1.90%     
==========================================
  Files         411      411              
  Lines      147795   177440   +29645     
==========================================
+ Hits        55861    63699    +7838     
- Misses      84164   105974   +21810     
+ Partials     7770     7767       -3     
Files Coverage Ξ”
exchanges/binance/binance_wrapper.go 38.41% <100.00%> (-2.71%) :arrow_down:
exchanges/binanceus/binanceus_wrapper.go 40.07% <100.00%> (-3.89%) :arrow_down:
exchanges/bitfinex/bitfinex_wrapper.go 36.57% <100.00%> (-4.05%) :arrow_down:
exchanges/bithumb/bithumb_wrapper.go 37.68% <100.00%> (-3.29%) :arrow_down:
exchanges/bitmex/bitmex_wrapper.go 44.01% <100.00%> (-5.77%) :arrow_down:
exchanges/bitstamp/bitstamp_wrapper.go 52.18% <100.00%> (-5.86%) :arrow_down:
exchanges/btcmarkets/btcmarkets_wrapper.go 30.54% <100.00%> (-2.97%) :arrow_down:
exchanges/btse/btse_wrapper.go 43.95% <100.00%> (-3.42%) :arrow_down:
exchanges/bybit/bybit.go 30.33% <ΓΈ> (-2.10%) :arrow_down:
exchanges/bybit/bybit_wrapper.go 40.44% <100.00%> (-3.69%) :arrow_down:
... and 41 more

... and 336 files with indirect coverage changes

codecov[bot] avatar Feb 04 '24 05:02 codecov[bot]

From a high level, looks quite good. I don't understand this point:

There's no handling for IsConnecting previously

In the context of the SetProxyAddress changes. I've added a bulletpoint to make this context clearer in the PR description and clarified the mutex. Specifically the issue is that a reconnecting websocket could have it's proxy changed and maybe race or end up in an undefined state.

And does this mean that the [okx] panic is fixed, is it still present?

Yes. I think the whole thing started when I was trying to debug it and found the connecting state being left in a strange situation during unwinding one of the connections. At the end of finishing this work I was so far from that work that I needed to step back and clear my desk of this one so I can focus on the okx panic in isolation.

gbjk avatar Feb 05 '24 03:02 gbjk

For clarity on re-reviewing:

  • No previous PR fixes have been squashed, but they have been rebased on master to get the linter
  • The only commits needing re-review should be: 208ca57a, c3edef2c and 3e19e83e ( 208ca57a Is resequenced and really trivial, but mentioning it for full disclosure )

gbjk avatar Feb 16 '24 07:02 gbjk

@shazbert Thanks! I like your direction there. A couple of notes though:

  1. As it is now I'm pretty sure the defer will race because you have Shutdown πŸ‘‰ ShutdownC πŸ‘‰ Shutdown: ➑️ Shutdown :arrow_right: Wg.Wait :arrow_right: context switch to trafficMonitor :arrow_right: ShutdownC :arrow_right: defer 🚨 Still connected at this point :arrow_right: Call Shutdown again (and block on mutex) :arrow_right: Other Shutdown gets to finish and set disconnected :arrow_right: Back to defer's Shutdown call again, now unblocked :arrow_right: IsConnected now false, return error β›” defer logs the error

I think there's no harm to moving the setState(disconnected) in Shutdown to before close(ShutdownC) because it better represents the state. If that causes something to call Connect then that's fine because w.m is still Locked. πŸ‘† Fixed 89be3d85

  1. My bigger problem with your pattern is the calls to time.Now() every cycle. Channels and timers are cheap and delegated to OS, but calling time.Now is much more expensive, I think. I guess we'll have to bench it 🀷

  2. On using w.Shutdown() instead of ShutdownC: where there's a dependency on a channel like this, I believe it's more unity to use the dependency than call a method, so that you're not unit testing more units than you need to. trafficMonitor depends on the channel, not on Shutdown(), and so where possible I'll use the channel.

gbjk avatar Feb 21 '24 01:02 gbjk

@shazbert Thanks! I like your direction there. A couple of notes though:

  1. As it is now I'm pretty sure the defer will race because you have Shutdown πŸ‘‰ ShutdownC πŸ‘‰ Shutdown: ➑️ Shutdown ➑️ Wg.Wait ➑️ context switch to trafficMonitor ➑️ ShutdownC ➑️ defer 🚨 Still connected at this point ➑️ Call Shutdown again (and block on mutex) ➑️ Other Shutdown gets to finish and set disconnected ➑️ Back to defer's Shutdown call again, now unblocked ➑️ IsConnected now false, return error β›” defer logs the error

I think there's no harm to moving the setState(disconnected) in Shutdown to before close(ShutdownC) because it better represents the state. If that causes something to call Connect then that's fine because w.m is still Locked. πŸ‘† Fixed 89be3d8

  1. My bigger problem with your pattern is the calls to time.Now() every cycle. Channels and timers are cheap and delegated to OS, but calling time.Now is much more expensive, I think. I guess we'll have to bench it 🀷
  2. On using w.Shutdown() instead of ShutdownC: where there's a dependency on a channel like this, I believe it's more unity to use the dependency than call a method, so that you're not unit testing more units than you need to. trafficMonitor depends on the channel, not on Shutdown(), and so where possible I'll use the channel.

Yeah all good points, I will let you decide on how you want to proceed. All valid and at the end of the day, yours works.

shazbert avatar Feb 21 '24 01:02 shazbert

I've used this to sniff the cost of time.Now().Add() versus channels.

BenchmarkAddTime-8      14308972                82.98 ns/op
BenchmarkChannel-8      38685134                30.75 ns/op

Even if we reduce the bench to just doing a time.Now().Add(it) and a t.Reset(it) we end up with 40 vs 35.

Finally I've looked at the defered cleanup. In essence we add 2 lines for the defer, Run a Stop when we don't need to, and and run a Connected and Shutdown check when we don't need to from ShutdownC. Or put another way: Two lines are common, but we waste time in both directions, and we add 2 lines for the defer.

So I'm going to leave it as is πŸ˜„

gbjk avatar Feb 21 '24 04:02 gbjk

Rebased on master and squashed commits.

gbjk avatar Feb 23 '24 04:02 gbjk