Performance Bottleneck | Screenshot Enumeration & WebSocket
Hey!
Thanks for open-sourcing such a novel EFT tool- I use it every time I play 😄
Issue
I play on Streets and Shoreline a lot, taking dozens of screenshots to update my map position on tarkov.dev via remote connect. The issue is this: the longer a raid goes on, and the more screenshots I take, the worse performance seems to get.
- each additional screenshot takes longer and longer between
screenshot taken >>> map updatedpipeline
Performance Profiling
I dug into why exactly this happens, by implementing some performance profiling on an experimental local branch. My north star here was measuring the milliseconds (ms) it took for each stage of the screenshot taken >>> map updated pipeline.
additional debugging & performance profiling details
I was using the built-in DEBUG profiler across a few different metrics:
MapUpdate_WebSocketSendto understand the transport cost as part of the overallscreenshot taken >>> map updatedpipelineScreenshotParseTimeto measure how long (inms) each screenshot took to parse for coordinate/heading/bearing metadataRaidDegradation_*indicators aka sequentialTimeToUpdatecost (inms) increases growing for every sequential screenshot
Findings
- In
GameWatcheron every screenshot event, files on disk are being counted via*.png. This operation grows in cost linearly, meaning the operation slows down progressively over the lifecycle of a long raid with lots of screenshots. - With
SocketClient.Sendthe websocket client is being created, started, then disposed for each screenshot event. Even if parsing is fast and payload creation is quick, we're paying the 100ms-400ms "transport" and TLS handshake costs every time, introducing latency.
Both of these are straightforward to fix, and I've successfully fixed them on my local with great results / very noticeable performance boosts. I'd like to issue some PRs for you to consider merging that would resolve issues 1 & 2?
screenshot taken >>> map updatedlatency went from an avg. of~800msto~300mson Streets~350msof the time-save came from the websocket persistence implementation itself
Proposed PRs
1.) Directory Enumeration Fix:
- remove the
*.pngpattern / directory enumeration - implement a low-cost "in-memory" store of screenshot filenames (for UX/cleanup)
2.) WebSocket Persistence During Raids:
- maintain a persistent WebSocket connection from raid start to raid end (event-driven)
- keep existing payload/schema the exact same
- if network is temporarily lost during a raid, attempt to reconnect and and resume and/or gracefully fail after
nattempts - pre-connect / lazy-connect upon raid start and disconnect cleanly at raid end
Very open to discussing how to go forward on this!
Cheers, Mike
Feel free to submit PR's I was going to review it as it sounds like a great QoL enhancement, but don't see anything submitted yet.
Unless there is something blocking your from opening a PR or addressing it properly let me know id be open to helping! 👍🏻
If there's a more efficient way to detect new screenshots, I'd be happy to review a PR for it. However, I'm not sure what you describe makes sense given my (possibly mistaken) understanding of how the screenshot watcher currently functions. It's using a FileSystemWatcher and as far as I know it's not really accurate to say that "files on disk are being counted via *.png.'" The FileSystemWatcher monitors a given path for file changes and then fires events depending on if the file changes meet the given criteria. These criteria include filename and file change type. The screenshot FileSystemWatcher is set up to only fire an event when .png files are either created or renamed (in my experience, some apps operate by copying a file to the destination and then renaming it). I would expect the FileSystemWatcher is only reacting to file system changes and the other files in a given folder are basically irrelevant. Moreover, it doesn't make sense that performance would only degrade over the course of a raid. If the problem is the FileSystemWatcher having to sort through all the .png files in the folder each time a new screenshot is taken, the symptom one would expect is that performance is bad for everyone who has lots of screenshots and good for people with zero screenshots. For people with lots of screenshots in their folder, the "linear" cost of one additional screenshot would actually be negligible.
The websocket stuff is more complicated. TarkovMonitor used to (attempt to) maintain a websocket connection and send any needed messages with that connection. However, some people leave TarkovMonitor running for days or weeks, and it becomes challenging to make sure the websocket connection is kept alive without also potentially spamming the websocket server with reconnect requests. Since the main thing TarkovMonitor currently uses the websocket connection for is updating player position, and since that isn't an action that theoretically will be done often and in quick succession, I decided that it would just simplify things greatly to create a new connection for each player position update and then dispose of it once the update has been sent. An added delay of 500ms or less isn't really that noticeable to users, and it significantly cut down on the amount of support inquiries from users who were experiencing problems with the the websocket connection not being maintained.