python-bittrex-websocket icon indicating copy to clipboard operation
python-bittrex-websocket copied to clipboard

huge memory leak right after reconnect

Open gzivdo opened this issue 6 years ago • 62 comments

The code just like in example only with subscribe_to_exchange_deltas

Output

2018-05-06 18:55:35 - bittrex_websocket.websocket_client - ERROR - Connection is already closed..
2018-05-06 18:55:35 - bittrex_websocket.websocket_client - ERROR - Initiating reconnection procedure
2018-05-06 18:55:35 - bittrex_websocket.websocket_client - INFO - Establishing connection to Bittrex through https://beta.bittrex.com/signalr.
2018-05-06 18:55:39 - bittrex_websocket.websocket_client - INFO - Successfully subscribed to [QueryExchangeState] for [BTC-ZCL].
2018-05-06 18:55:39 - bittrex_websocket.websocket_client - INFO - Successfully subscribed to [SubscribeToExchangeDeltas] for [BTC-ZCL].
2018-05-06 18:55:39 - bittrex_websocket.websocket_client - INFO - Successfully subscribed to [QueryExchangeState] for [BTC-RDD].
2018-05-06 18:55:39 - bittrex_websocket.websocket_client - INFO - Successfully subscribed to [SubscribeToExchangeDeltas] for [BTC-RDD].
2018-05-06 18:55:39 - bittrex_websocket.websocket_client - INFO - Successfully subscribed to [QueryExchangeState] for [BTC-TRX].
2018-05-06 18:55:39 - bittrex_websocket.websocket_client - INFO - Successfully subscribed to [SubscribeToExchangeDeltas] for [BTC-TRX].
2018-05-06 18:56:20.775091 15593
2018-05-06 18:57:20.835102 15659
2018-05-06 18:58:21.108694 15691
2018-05-06 18:59:30.381624 15736
2018-05-06 19:00:06 - bittrex_websocket.websocket_client - ERROR - Undocumented error message received with code -1 and payload: 32. Report to https://github.com/slazarov/python-bittrex-websocket
2018-05-06 19:00:06 - bittrex_websocket.websocket_client - ERROR - Initiating reconnection procedure
2018-05-06 19:00:06 - bittrex_websocket.websocket_client - INFO - Establishing connection to Bittrex through https://beta.bittrex.com/signalr.

After first reconnect app starts to eat all available memory and swap.

gzivdo avatar May 06 '18 16:05 gzivdo

Error 32 should refer to socket.error: [Errno 32] Broken pipe and is due to connection drop. If your connection is working fine, then I suspect that the error is coming from Bittrex's side since recently they have been pushing constant updates in order to get the main endpoint stable. Note that we are using the beta endpoint currently in order to avoid bypassing cloudflare. You can try:

ws = MySocket(url='https://socket.bittrex.com/signalr')

Just make sure you have installed cfscrape and node.js.

The reconnection memory leak is due to the fact that the client tries to reconnect and since it's denied immediately it goes into an infinite loop which causes high CPU usage. This is on my side and I will implement a fix. I am thinking of maximum connection retries solution.

slazarov avatar May 06 '18 16:05 slazarov

cpu usage not high at all, but memory usage rising quite fast.

gzivdo avatar May 06 '18 16:05 gzivdo

Doesn't change the fact.

Also when posting make sure you specify what Python version you are using and also provide the code you are executing. In this case you said you are using the example but you are also invoking QueryExchangeState and I have no idea whether you are executing it in a loop.

slazarov avatar May 06 '18 16:05 slazarov

Python 2.7.x

from bittrex_websocket.websocket_client import BittrexSocket
import cPickle, time, datetime

tickers = {
    'ZCL':  (35, 5),
    'RDD': (20000, 2000),
    'TRX': (5000, 2000),
}


class BittrexSocket1(BittrexSocket):
        def __init__(self, tickers, url=None):
            super(BittrexSocket1, self).__init__(url)
            self.tickers = tickers
            self.cnt = 0
            for ticker in self.tickers:
                time.sleep(0.01)
                self.last[ticker] = time.time()
                self.query_exchange_state(['BTC-'+ticker])
                self.subscribe_to_exchange_deltas(['BTC-'+ticker])


        def on_public(self, msg):
            ticker = msg['M'].split('-')[-1]
            self.cnt += 1
            return

def main():
    ws = BittrexSocket1(tickers)
    ws.enable_log()
    try:
        while True:
            sleep(60)
            print datetime.datetime.now(), ws.cnt
    except KeyboardInterrupt:
        pass
    print 'exiting'
    ws.disconnect()
    sleep(10)

if __name__ == "__main__":
    main()

gzivdo avatar May 06 '18 17:05 gzivdo

I will test it now.

1.) How often does the issue happens?

2.) Is there any pattern in the average time between starting the code and the occurrence of the error? E.g every 10mins or so.

slazarov avatar May 06 '18 17:05 slazarov

No, it happens just after network error. Sometimes its 2h, sometimes 6-8h

gzivdo avatar May 06 '18 17:05 gzivdo

I see, I will work on a fix and will push it to github in due time. Thank you for reporting the bug.

slazarov avatar May 06 '18 17:05 slazarov

capture I'm having issues with reconnection too. Mine lasts quite a bit, but at least once a day it breaks. I'm running on a server with 1GB RAM only though. I don't run connections on a loop, it happens once and then I call a function to take care of the data + rest requests from the same library you use. Also, I don't have the other errors, but I think I got a different one last time even though it was from reconnection also. I can take a print next time if it helps. (python 3)

hamkline avatar May 12 '18 22:05 hamkline

Its a temporary fix but do the following: update to the latest version from github and replace line 140-143:

            elif invoke == BittrexMethods.GET_AUTH_CONTENT:
                self.connection.corehub.server.invoke(invoke, payload[0])
                self.invokes.append({'invoke': invoke, 'ticker': payload[0]})
                logger.info('Retrieving authentication challenge.')

with

            elif invoke == BittrexMethods.GET_AUTH_CONTENT:
                self.connection.corehub.server.invoke(invoke, self.credentials['api_key'])
                self.invokes.append({'invoke': invoke, 'ticker': self.credentials['api_key']})
                logger.info('Retrieving authentication challenge.')

slazarov avatar May 12 '18 22:05 slazarov

screenshot_2018-05-13_12-17-40 I'm also having troubles on reconnection.. Seems like it blew through by VPS' available memory and was killed by linux oom-killer.

ayy1337 avatar May 13 '18 02:05 ayy1337

Also, I notice that you have in the websocket_client.py put a variable 'self.threads = []', but when a thread dies or is finished it is not removed from this list, preventing it from being garbage collected (as there are still references to it) Honestly I can't even tell why the list is there at all, since it doesn't seem to be accessed anywhere..

ayy1337 avatar May 13 '18 03:05 ayy1337

screenshot_2018-05-13_13-32-39 Managed to catch it in htop. What are all these threads doing?

ayy1337 avatar May 13 '18 03:05 ayy1337

Hi, self.threads = [] is a leftover from v<1.0, so thanks for noticing it. I am aware of what might be causing the thread spam and will fix in due time. As always thanks for raising an issue but next time provide additional information such as the whole log, code snippet, python version and a few word explanation since its difficult to debug from a picture.

Is node.js installed together with cfscrape?

slazarov avatar May 13 '18 06:05 slazarov

Update the package to v1.0.4 through: pip install git+https://github.com/slazarov/[email protected]

Theoretically this should fix it, waiting for your feedback after testing.

slazarov avatar May 13 '18 11:05 slazarov

Is v1.0.4 only difference in the code you mentioned for me above? Because I still got the error with that. If not let me know and I'll test with the new code and provide feedback.

Just in case I'll post it here. capture It's funny that it only shows those few lines, so if it was trying to reconnect a lot, in a loop or something, it should have more output, right?

hamkline avatar May 13 '18 12:05 hamkline

As I said I can’t comment on pictures without any code.

v1.0.4 is different than the code I posted, you can check the commits.

slazarov avatar May 13 '18 12:05 slazarov

As I said, I subscribe and then stick to a function with a while True that handles the data forever, just like in your example with the __init__ having this: self.ticker_updates_container = {} which haves msg value every time on_update.

I'll test v1.0.4 and provide feedback.

hamkline avatar May 13 '18 14:05 hamkline

As requested, here is a pastebin with full code, is pretty basic honestly.. https://pastebin.com/a5UvBZqU And here is a pastebin with the full output from running the above code: https://pastebin.com/LSN9QDMc

Python version was 3.5, don't believe node.js is installed no. Cfscrape is installed. (this is re: your first reply, I will update to 1.0.4 and do further testing)

ayy1337 avatar May 13 '18 15:05 ayy1337

Hi, I will take a look at your logs. Have you updated to v1.0.4. If not, please do so and test.

Cfscrape requires nodejs to work. It’s not resulted in an error on your side so far because bittrex has disabled cloudflare currently. I recommend installing node.js.

On May 13, 2018 at 6:42 pm, <ayy1337 (mailto:[email protected])> wrote:

As requested, here is a pastebin with full code, is pretty basic honestly.. https://pastebin.com/a5UvBZqU And here is a pastebin with the full output from running the above code: https://pastebin.com/LSN9QDMc

Python version was 3.5, don't believe node.js is installed no. Cfscrape is however.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub (https://github.com/slazarov/python-bittrex-websocket/issues/61#issuecomment-388636054), or mute the thread (https://github.com/notifications/unsubscribe-auth/AbVUGwXHjL00m2il0eHFh8M3Stq6KObjks5tyFRvgaJpZM4T0GNn).

slazarov avatar May 13 '18 15:05 slazarov

screenshot_2018-05-14_02-04-47 My bad, it appears to have had node.js I was just ignorant of the appropriate command (i was trying 'node' and 'npm')

Edit: I will update to 1.0.4 now and do further testing, haven't done so yet Edit2: Have updated to new version, will post again with logs and whatnot when test completes

ayy1337 avatar May 13 '18 16:05 ayy1337

Okay so I updated and ran the same script again with the new version, have the output from my tests. This paste I disconnected the network connection manually for 60 seconds and reconnected it. https://pastebin.com/V0bZHpY9

For the second test I let it run until it ran into an issue on its own, https://pastebin.com/F9y0hLhm

In neither case did it successfully reconnect after the error

ayy1337 avatar May 14 '18 04:05 ayy1337

I got that too, here's a print just in case it helps: capture It was in the same function I mentioned earlier, it's the one that handles self.ticker_updates_container and nothing more. Again, same place under a while True loop that other errors occurred. My program didn't really end with killed this time though, probably the socket disconnected and it kept handling the same data on self.ticker_updates_container till I woke up.

hamkline avatar May 14 '18 11:05 hamkline

Hi, the reason for the error is that exceptions in Python2 have a 'message' attribute whereas exceptions in Python3 have 'args' instead. This is fixed. Do:

pip install git+https://github.com/slazarov/[email protected]

@ayy1337 I have added TimeoutError as a new exception in the reconnection logic. I reckon the exception is from urllib3.exceptions import TimeoutError is the correct one. Please do your tests and update me.

As a whole, I have disabled the auto-reconnection for unknown errors since I prefer to add them when they appear and after I've become acquainted with them.

slazarov avatar May 14 '18 21:05 slazarov

For printing exception info I usually use (in 3.5 etc.) "{} {}".format(type(e).__name__, e) You don't need .args or .message Edit: This works on python 2.7 as well.. screenshot_2018-05-15_11-31-14

P.s. the url git+https://github.com/slazarov/[email protected] doesn't actually work since the update wasn't tagged with a version on git. screenshot_2018-05-15_10-43-08

ayy1337 avatar May 15 '18 00:05 ayy1337

Hi @ayy1337, thanks for the suggestion. Also my bad, I pushed directly to the master so you can update through: pip install git+https://github.com/slazarov/python-bittrex-websocket.git

slazarov avatar May 15 '18 04:05 slazarov

screenshot_2018-05-17_17-11-13 Okay so running the tests again on version 1.0.5, this first pic is from just letting it run until it ran into troubles; the reconnect seems to have worked this time without blowing up the memory which is good, though it failed to resubscribe to the account deltas

screenshot_2018-05-17_17-14-25 This second pic is by forcibly disconnecting the internet connection, again for around 60 seconds before reconnecting.

ayy1337 avatar May 17 '18 07:05 ayy1337

Can you paste the contents of websocket_client.py/_handle_subscribe method?

slazarov avatar May 18 '18 22:05 slazarov

I wasn't sure which package was raising the TimeoutError. Initially I thought it was urllib3.exceptions but probably it's the one from the built-ins. I have pushed the commit to v.1.0.6.

slazarov avatar May 18 '18 23:05 slazarov

Can you paste the contents of websocket_client.py/_handle_subscribe method?

Sure (1.0.5) screenshot_2018-05-19_12-02-35

ayy1337 avatar May 19 '18 02:05 ayy1337

Updated to 1.0.6, tried the forceful disconnection test again and this time it reconnected successfully after the connection was restored :) screenshot_2018-05-19_12-14-10

I will run the test where I let it naturally reconnect now, though this can take some time.

ayy1337 avatar May 19 '18 02:05 ayy1337