galaxy-integration-steam icon indicating copy to clipboard operation
galaxy-integration-steam copied to clipboard

Plugin crash >15k games, plugin log cold

Open C-O-S opened this issue 4 years ago • 36 comments

Describe the bug Plugin crashes shortly after configuring it to sync with Steam

To Reproduce

  • Launch Galaxy
  • Configure Steam Integration with my account that has 15k+ games (more if you factor in the ones Steam no longer counts) ( https://steamcommunity.com/profiles/76561197973009892 )
  • Wait for sync to occur - 2 or 3 minutes,
  • Plugin crashes with no output or sync evident.

Expected behavior Plugin to sync and function as expected.

Attached logs logs.zip

C-O-S avatar Jul 20 '20 14:07 C-O-S

I receive the dreaded plugin has crashed when syncing to steam too

dc232 avatar Jul 20 '20 16:07 dc232

Same issue, seems that its crashing during library synchronization, but no error observed in logs. logs.zip

dandepeched avatar Jul 22 '20 06:07 dandepeched

I added to library all free games with help of https://steamdb.info/freepackages I have 4456 games in library.

kotenok2000 avatar Aug 09 '20 19:08 kotenok2000

Here are some errors in logs that I found just before the crash. Maybe this would help. 1)

2020-08-18 09:38:05,644 - protocol.protobuf_client - INFO - New job on list {'job_name': 'import_game_stats', 'game_id': '781440'}
2020-08-18 09:38:05,644 - protocol.protobuf_client - INFO - Importing game stats for 781440
2020-08-18 09:38:05,644 - protocol.protobuf_client - INFO - Sending message 818 (32 bytes)
2020-08-18 09:38:05,645 - root - DEBUG - Sending data: {"jsonrpc": "2.0", "id": "14", "error": {"code": -32001, "message": "Method aborted"}}
2020-08-18 09:38:05,646 - galaxy.task_manager - DEBUG - Task manager jsonrpc server: finished task 8 (start_achievements_import)
2020-08-18 09:38:05,646 - root - DEBUG - Sending data: {"jsonrpc": "2.0", "id": "15", "error": {"code": -32001, "message": "Method aborted"}}
2020-08-18 09:38:05,646 - galaxy.task_manager - DEBUG - Task manager jsonrpc server: finished task 9 (start_game_times_import)
2020-08-18 09:38:05,646 - protocol.protobuf_client - INFO - Sending log off message
2020-08-18 09:38:05,646 - protocol.protobuf_client - INFO - Sending message 706 (23 bytes)
2020-08-18 09:38:05,647 - protocol.websocket_client - WARNING - Websocket task cancelled CancelledError()
2020-08-18 09:38:05,648 - protocol.websocket_client - INFO - Closing websocket
2020-08-18 09:38:05,648 - protocol.protobuf_client - INFO - Send message success
2020-08-18 09:38:05,844 - galaxy.api.jsonrpc - INFO - Sending notification: method=owned_game_added, params={'owned_game': Game(game_id='1308650', game_title='Axegend:Episode I', dlcs=[], license_info=LicenseInfo(license_type=<LicenseType.SinglePurchase: 'SinglePurchase'>, owner=None))}
2020-08-18 09:38:05,844 - galaxy.api.jsonrpc - DEBUG - Sending 203 bytes of data
**NO MORE LOGS. PLUGIN CRASHED**
2020-08-17 23:14:19,465 - root - DEBUG - Sending data: {"jsonrpc": "2.0", "id": "46", "error": {"code": 600, "message": "Import already in progress"}}
2020-08-17 23:14:19,465 - galaxy.task_manager - DEBUG - Task manager jsonrpc server: finished task 17 (start_game_times_import)
2020-08-17 23:14:19,466 - galaxy.task_manager - DEBUG - Task manager plugin external: creating task 23 (start_game_times_import)
2020-08-17 23:14:19,466 - galaxy.api.jsonrpc - DEBUG - Received 98 bytes of data
2020-08-17 23:14:19,466 - galaxy.api.jsonrpc - INFO - Handling request: id=49, method=start_achievements_import, params={'game_ids': ['1077340']}
2020-08-17 23:14:19,466 - galaxy.task_manager - DEBUG - Task manager jsonrpc server: creating task 20 (start_achievements_import)
2020-08-17 23:14:19,466 - root - DEBUG - Sending data: {"jsonrpc": "2.0", "id": "47", "error": {"code": 600, "message": "Import already in progress"}}
2020-08-17 23:14:19,466 - galaxy.task_manager - DEBUG - Task manager jsonrpc server: finished task 18 (start_achievements_import)
2020-08-17 23:14:19,467 - galaxy.task_manager - DEBUG - Task manager plugin external: creating task 24 (start_achievements_import)
2020-08-17 23:14:19,467 - galaxy.api.jsonrpc - DEBUG - Received 96 bytes of data
2020-08-17 23:14:19,467 - galaxy.api.jsonrpc - INFO - Handling request: id=50, method=start_game_times_import, params={'game_ids': ['1077340']}
2020-08-17 23:14:19,467 - galaxy.task_manager - DEBUG - Task manager jsonrpc server: creating task 21 (start_game_times_import)
2020-08-17 23:14:19,467 - root - DEBUG - Sending data: {"jsonrpc": "2.0", "id": "48", "error": {"code": 600, "message": "Import already in progress"}}
2020-08-17 23:14:19,467 - galaxy.task_manager - DEBUG - Task manager jsonrpc server: finished task 19 (start_game_times_import)
2020-08-17 23:14:19,467 - galaxy.task_manager - DEBUG - Task manager plugin external: creating task 25 (start_game_times_import)
2020-08-17 23:14:19,467 - galaxy.api.jsonrpc - DEBUG - Received 55 bytes of data
2020-08-17 23:14:19,467 - galaxy.api.jsonrpc - INFO - Handling request: id=51, method=ping, params={}
2020-08-17 23:14:19,468 - root - DEBUG - Sending data: {"jsonrpc": "2.0", "id": "51", "result": null}
2020-08-17 23:14:19,468 - protocol.protobuf_client - INFO - Processing packet of 59 bytes
2020-08-17 23:14:19,468 - protocol.protobuf_client - INFO - Processing message 1
2020-08-17 23:14:19,468 - protocol.protobuf_client - INFO - Processing message Multi
2020-08-17 23:14:19,468 - protocol.protobuf_client - INFO - Processing packet of 41 bytes
2020-08-17 23:14:19,468 - protocol.protobuf_client - INFO - Processing message 8904
2020-08-17 23:14:19,469 - protocol.protobuf_client - INFO - Processing message PICSProductInfoResponse
2020-08-17 23:14:19,469 - protocol.protobuf_client - INFO - Finished processing message Multi
2020-08-17 23:14:19,469 - root - DEBUG - Sending data: {"jsonrpc": "2.0", "id": "49", "error": {"code": 600, "message": "Import already in progress"}}
2020-08-17 23:14:19,469 - galaxy.task_manager - DEBUG - Task manager jsonrpc server: finished task 20 (start_achievements_import)
2020-08-17 23:14:19,469 - galaxy.api.jsonrpc - DEBUG - Received 55 bytes of data
2020-08-17 23:14:19,469 - galaxy.api.jsonrpc - INFO - Handling request: id=52, method=ping, params={}
2020-08-17 23:14:19,469 - root - DEBUG - Sending data: {"jsonrpc": "2.0", "id": "52", "result": null}
2020-08-17 23:14:19,470 - root - DEBUG - Sending data: {"jsonrpc": "2.0", "id": "50", "error": {"code": 600, "message": "Import already in progress"}}
2020-08-17 23:14:19,470 - galaxy.task_manager - DEBUG - Task manager jsonrpc server: finished task 21 (start_game_times_import)
2020-08-17 23:14:19,470 - galaxy.api.jsonrpc - DEBUG - Received 55 bytes of data
2020-08-17 23:14:19,470 - galaxy.api.jsonrpc - INFO - Handling request: id=53, method=ping, params={}
2020-08-17 23:14:19,471 - root - DEBUG - Sending data: {"jsonrpc": "2.0", "id": "53", "result": null}
2020-08-17 23:14:19,472 - galaxy.api.jsonrpc - DEBUG - Received 55 bytes of data
2020-08-17 23:14:19,472 - galaxy.api.jsonrpc - INFO - Handling request: id=54, method=ping, params={}
2020-08-17 23:14:19,472 - root - DEBUG - Sending data: {"jsonrpc": "2.0", "id": "54", "result": null}
2020-08-17 23:14:19,472 - protocol.protobuf_client - INFO - Processing packet of 33704 bytes
2020-08-17 23:14:19,472 - protocol.protobuf_client - INFO - Processing message 1
2020-08-17 23:14:19,472 - protocol.protobuf_client - INFO - Processing message Multi
2020-08-17 23:14:19,473 - galaxy.api.jsonrpc - DEBUG - Received 59 bytes of data
2020-08-17 23:14:19,473 - galaxy.api.jsonrpc - INFO - Handling request: id=55, method=shutdown, params={}
2020-08-17 23:14:19,473 - galaxy.task_manager - DEBUG - Task manager jsonrpc server: creating task 22 (shutdown)
2020-08-17 23:14:19,473 - galaxy.api.plugin - INFO - Shutting down
2020-08-17 23:14:19,473 - galaxy.api.plugin - INFO - Closing plugin
2020-08-17 23:14:19,473 - galaxy.api.jsonrpc - INFO - Closing JSON-RPC server - not more messages will be read
2020-08-17 23:14:19,474 - galaxy.task_manager - DEBUG - Task manager plugin internal: creating task 1 (shutdown)
2020-08-17 23:14:19,474 - galaxy.api.plugin - DEBUG - Plugin run loop finished
2020-08-17 23:14:19,474 - galaxy.api.plugin - DEBUG - Waiting for plugin to close
2020-08-17 23:14:19,474 - root - DEBUG - Sending data: {"jsonrpc": "2.0", "id": "31", "error": {"code": -32001, "message": "Method aborted"}}
2020-08-17 23:14:19,474 - galaxy.task_manager - DEBUG - Task manager jsonrpc server: finished task 6 (start_achievements_import)
2020-08-17 23:14:19,475 - root - DEBUG - Sending data: {"jsonrpc": "2.0", "id": "32", "error": {"code": -32001, "message": "Method aborted"}}
2020-08-17 23:14:19,475 - galaxy.task_manager - DEBUG - Task manager jsonrpc server: finished task 7 (start_game_times_import)
2020-08-17 23:14:19,475 - protocol.protobuf_client - INFO - Sending log off message
2020-08-17 23:14:19,475 - protocol.protobuf_client - INFO - Sending message 706 (28 bytes)
2020-08-17 23:14:19,476 - protocol.protobuf_client - INFO - Send message success
2020-08-17 23:14:19,476 - protocol.websocket_client - WARNING - Websocket task cancelled CancelledError()
2020-08-17 23:14:19,476 - galaxy.task_manager - DEBUG - Task manager plugin external: canceled task 1 (Run WebSocketClient)
2020-08-17 23:14:19,712 - galaxy.task_manager - DEBUG - Task manager plugin internal: finished task 0 (tick)
2020-08-17 23:14:19,712 - galaxy.api.jsonrpc - INFO - Sending notification: method=owned_game_added, params={'owned_game': Game(game_id='1347490', game_title='Hakoniwa Sumo', dlcs=[], license_info=LicenseInfo(license_type=<LicenseType.SinglePurchase: 'SinglePurchase'>, owner=None))}
2020-08-17 23:14:19,712 - galaxy.api.jsonrpc - DEBUG - Sending 194 bytes of data
**NO MORE LOGS. PLUGIN CRASHED**

Other crashes looks to be at random place like this one:

2020-08-18 10:28:40,208 - galaxy.api.jsonrpc - INFO - Sending notification: method=push_cache, params={'data': '****'}
2020-08-18 10:28:40,233 - galaxy.api.jsonrpc - DEBUG - Sending 4208779 bytes of data
2020-08-18 10:28:40,243 - galaxy.api.jsonrpc - INFO - Sending notification: method=owned_game_added, params={'owned_game': Game(game_id='296930', game_title='Ascendant', dlcs=[], license_info=LicenseInfo(license_type=<LicenseType.SinglePurchase: 'SinglePurchase'>, owner=None))}
2020-08-18 10:28:40,243 - galaxy.api.jsonrpc - DEBUG - Sending 189 bytes of data
**NO MORE LOGS. PLUGIN CRASHED**

dandepeched avatar Aug 18 '20 07:08 dandepeched

Thanks, please try out solution posted here: https://github.com/FriendsOfGalaxy/galaxy-integration-steam/issues/77#issuecomment-674946922

FriendsOfGalaxy avatar Aug 18 '20 08:08 FriendsOfGalaxy

So, I went into Galaxy and ensured the plugin wasn't configured, then went in and cleared the contents of the steam plugin and dropped the files from "possible.fix.for.parsing.licences" file into it.

Restarted the Galaxy client and went in to configure the integration. The first time it immediately crashed out when trying to configure it, but the second time it worked and I watched the logs start generating as it operated.

It definitely got further this time. It ran for a few hours, rolling the log files as it went, but eventually the plugin crashed.

Remaining logs are attached:

logs.zip

C-O-S avatar Aug 18 '20 16:08 C-O-S

Thanks for logs. I see the issue is similar to what @dandepeched have. You have >17k games right?

https://github.com/FriendsOfGalaxy/galaxy-integration-steam/issues/77#issuecomment-675950270

FriendsOfGalaxy avatar Aug 19 '20 08:08 FriendsOfGalaxy

@FriendsOfGalaxy , I have ~7500 licenses in Steam library. I'm not sure what is the count of DLCs, but my steam stats is showing ~600. So I don't think that total count will go over 10k, definitely not 40k. Memory consumption of Galaxy is ok - ~650MB, but CPU usage is constantly big - 15-20%. I don't know how to get this stats for steam plugin only. Capture

dandepeched avatar Aug 19 '20 09:08 dandepeched

@dandepeched right, my mistake. Too many logs opened simultaneously.

And what about plugin memory? You can inspect by going to task manager -> children processes of Galaxy -> right lick -> go to details -> add column "Command line" -> check in command line arguments which python plugin is for Steam -> read what is under Memory (private working set)

FriendsOfGalaxy avatar Aug 19 '20 09:08 FriendsOfGalaxy

@FriendsOfGalaxy , plugin uses ~60MB. This is 3 times more than any other plugin, but that's probably because of games amount. CPU is not used by plugin in stale phase, only on active sync phase.

dandepeched avatar Aug 19 '20 10:08 dandepeched

~~60MB is a lot. Can you catch screenshot to make sure we're looking on the same? Mine takes 28K (250games).~~ EDIT mine has 54MB, so not much diference

Does it reproduce for you like in OP?

FriendsOfGalaxy avatar Aug 19 '20 11:08 FriendsOfGalaxy

@FriendsOfGalaxy , crash did not occur since the plugin updated to version provided by you and pressing Retry after the first crash. Should I reconnect Steam in order to try to catch the crash? Or any other steps?

Capture Capture2

dandepeched avatar Aug 19 '20 11:08 dandepeched

@C-O-S The logs you attached the last time shows one more bug - the case where plugin times out on waiting for steam to get owned games. I'll put this in my backlog, but won't investigate right now.

Meanwhile I've added one more fix for crashes in #77 but it may help also in your case. Use this build (2 fixes in one so far): steam_2_fixes.zip (windows)

@dandepeched this should help you as well.

To have clean test

  • disconnect plugin from Galaxy
  • close Galaxy
  • replace plugin files
  • launch Galaxy

FriendsOfGalaxy avatar Aug 24 '20 11:08 FriendsOfGalaxy

@FriendsOfGalaxy - I performed clean test, ran it for hours and the plugin did not crash.

That said, it really didn't seem to ingest the information from the Steam account as expected. Maybe 100-ish games imported and then it just ran for hours cycling logs without doing much else. Logs included below for reference:

logs-for-2fixes.zip

C-O-S avatar Aug 24 '20 20:08 C-O-S

@FriendsOfGalaxy - FWIW, my steam library was ~2500 games (more including hidden). It was repeatedly crashing, and more recently just timing ou and losing the authentication/session etc. before. After downloading the above steam_2_fixes.zip it managed to get on its feet at least and update my library/installed in a few mins. Will have to see if it is happy over the course of days and weeks.

ppriest avatar Aug 24 '20 23:08 ppriest

@ppriest I glad it does not crash now. Unfortunately I'll go offline/lose connection until #74 is fixed.

@C-O-S Thanks for letting me know. Looks like games cache have to be reworked due to wrong assumption. We assumed steam send all licenses in one message, but in fact it sends multiple messages so cache invalidation logic should be put one level above that.

I know that I've asked you a lot so far but can you test this one? I fixed the first issue differently (not using another thread to avoid thread-safety issues, just optimize data structure) and commented out part of code responsible for owned games cache invalidation: steam_2_fixes_v2_and_no_cache_invalidation.zip

FriendsOfGalaxy avatar Aug 25 '20 13:08 FriendsOfGalaxy

@FriendsOfGalaxy Disconnect / Clean Plugin Install / Clean Logs - configured and ran the plugin again - it ran for ~11 minutes and the plugin crashed. However, this time it crashed before the logs could fully rotate out - so you should have a complete set of logs from initiation to the crash occurring.

logs.zip

C-O-S avatar Aug 25 '20 14:08 C-O-S

Ah, another bottleneck is on Processing message PICSProductInfoResponse that takes 12-15 seconds for you. So sometimes crash sometimes not (~15 sec is Galaxy crash threshold)

Grab this workaround: steam_2_fixes_v2_and_no_cache_invalidation_async_app_info_hanlder.zip

FriendsOfGalaxy avatar Aug 25 '20 14:08 FriendsOfGalaxy

@FriendsOfGalaxy Same setup process as last time. It ran longer this time but ended with another crash.

logs.zip

C-O-S avatar Aug 25 '20 17:08 C-O-S

@C-O-S thanks. Try this: steam_4_fixes.zip

FriendsOfGalaxy avatar Aug 28 '20 08:08 FriendsOfGalaxy

@FriendsOfGalaxy :: Clean Run, Clean Logs - ended in a plugin crash.
BTW - I don't mind helping you test these, so please feel free to keep sending them my way.

logs-4_fixes.zip

C-O-S avatar Aug 28 '20 13:08 C-O-S

Thanks. I don't understand why it happened - should be fixed. Are you sure no older plugin version is sitting in your "installed" directory?

FriendsOfGalaxy avatar Aug 28 '20 14:08 FriendsOfGalaxy

@FriendsOfGalaxy - Steps I take to test a provided zip file:

  1. Startup GOG Galaxy Client
  2. Settings -> Integrations -> Steam -> Disconnect
  3. Wait for Integration to toggle back to displaying "Connect" as an option for Steam
  4. Shutdown the GOG Galaxy client
  5. Go to C:\Users\(user)\AppData\Local\GOG.com\Galaxy\plugins\installed\steam_(HexData)
  6. Shift-Delete entire contents of the directory
  7. Extract newly provided contents of zip file, underneath output, into this directory.
  8. Go to %programdata%\GOG.com\Galaxy\logs
  9. Shift-Delete entire contents of the directory
  10. Startup GOG Galaxy Client
  11. Settings -> Integrations -> Steam -> Connect
  12. Input Login, PW, and Steam Guard Code

... then wait for resulting behavior and report.

I'll run through the process again with the latest file just to see if we get the same result.

C-O-S avatar Aug 28 '20 14:08 C-O-S

@FriendsOfGalaxy - The second run also resulted in a crash condition. Logs included below.

logs-4-fixes-run2.zip

C-O-S avatar Aug 28 '20 16:08 C-O-S

@C-O-S thanks. One more issue found. But I'm a bit confused why dictionary lookup take so long (sometimes even longer than 30ms). Or maybe this is the logging side-effect? I mean, just the loop below.

games_cache.py

            for appid in license.app_ids:
                if appid not in self._storing_map.apps:
                    logger.warning(f"Tried to retrieve unresolved app: {appid} for license: {license.package_id}!")
                    continue

self._storing_map.apps is dict.

I'm dropping some different potential fixes for this.

~5fix_gen.zip~ 5fix_longer_await.zip 5fix_no_log.zip

EDIT: 5fix_gen_v2.zip

FriendsOfGalaxy avatar Sep 01 '20 10:09 FriendsOfGalaxy

@FriendsOfGalaxy - 5fix_gen_v2.zip - long run, resulting in plugin crash condition. Logs included below:

5fix_gen_v2-logs.zip

I will work my way through the other two releases you posted and provide those results as well soon.

C-O-S avatar Sep 01 '20 14:09 C-O-S

@FriendsOfGalaxy - 5fix_longer_await.zip - long run, resulting in plugin crash condition. Logs included below:

5fix_longer_await_logs.zip

C-O-S avatar Sep 01 '20 16:09 C-O-S

@FriendsOfGalaxy - 5fix_no_log.zip - long run, resulting in plugin crash condition. Logs included below:

5fix_no_log_logs.zip

C-O-S avatar Sep 01 '20 18:09 C-O-S

Thanks.

Notes for myself:

  • 5fix_gen_v2-logs.zip
    • most of warnings lasts around 1ms, but after PICS Product response at 2020-09-01 10:08:41,986 - protocol.protobuf_client - INFO - Processing message PICSProductInfoResponse came from steam, time delta between warnings drastically raises - TODO: check if we do properly lock / do copy to not iterate over evolving data.
    • just after above we expect ping (10:08:42) but it was delayed till 10:08:46 - why? TODO: extend logs for starting new generators fetch to confirm/reject theory of not enough 1 frame with await.sleep(0)
  • 5fix_longer_await_logs.zip - longer wait works - no stuck during getting owned games; but plugin crashed later - with stuck logs ~15 seconds before crash. From GalaxyClient log we can see that plugin operates normally at this time - it sends information about friend presence despite the plugin log is cold (no more entries). TODO: prepare version with alternative logging file handler.
  • 5fix_no_log_logs.zip - plugin log is cold again. It does not crash before - does it mean disabling logging helped? Or just some luck.

@C-O-S could you retest 5fix_longer_await_logs.zip? Just to be sure if the problem remains the same - maybe this time logs won't hang on. Meantime I'm preparing release for what I know already.

FriendsOfGalaxy avatar Sep 04 '20 13:09 FriendsOfGalaxy

@FriendsOfGalaxy - per request, second run of 5fix_longer_await.zip was performed. Long run time, resulting in crash condition. Please see below:

5fix_longer_await_Run2_logs.zip

C-O-S avatar Sep 04 '20 15:09 C-O-S