legendary icon indicating copy to clipboard operation
legendary copied to clipboard

Reuse chunks from the same download if duplicated

Open Roman513 opened this issue 1 year ago • 11 comments

Drastically decrease memory usage

Usually we do not need even 1Gb of shared memory with this patch.

Closes #17

Roman513 avatar Dec 23 '24 23:12 Roman513

I've not done a full review of this but I think at the very least it should be an option like reordering (ideally also in the config so you don't have to specify it each time).

derrod avatar Dec 24 '24 19:12 derrod

It should be opt-in, not opt-out, like reordering.

derrod avatar Dec 25 '24 09:12 derrod

I think at the very least it should be an option like reordering (ideally also in the config so you don't have to specify it each time).

I've created an option to disable this behavior. This feature functions more like reusing chunks during patching (which is enabled by default) than reordering. It is simply an adaptation of the same algorithm - just scroll a few lines up in the analysis function to find it.

It should be opt-in, not opt-out, like reordering.

Reordering is likely disabled by default because it is very time-consuming with questionable benefits. However, this option is not as resource-intensive. I conducted measurements on an extremely large project (~860 GB), and it took about 1.5 seconds on my machine. For normal projects, it typically takes around 0.0-0.1 seconds. In contrast, reordering calculations take about 4 minutes.

python legendary/cli.py install fb67ca3f5de74da880ab96bdc668118d
[Core] INFO: Trying to re-use existing login session...
[cli] INFO: Preparing download for "ARK DevKit" (fb67ca3f5de74da880ab96bdc668118d)...
...
[DLM] DEBUG: 666373 added files
[DLM] DEBUG: Disk space delta: 863984.23 MiB

# Without enable-reordering

[DLM] DEBUG: Final cache size requirement: 10274.0 MiB.

# With enable-reordering, 'Manifest contains too many files' block was commented out to test this

[DLM] DEBUG: Processing optimizations took 244.0 seconds.
[DLM] DEBUG: Final cache size requirement: 390.0 MiB.

# With read from files enabled, temporary added time calculation to test this

[DLM] DEBUG: Analyzing manifest for re-usable chunks in saved files...
[DLM] DEBUG: Processing read from saved files took 1.5 seconds.
[DLM] DEBUG: Final cache size requirement: 81.0 MiB.

Roman513 avatar Dec 25 '24 10:12 Roman513

The entire system was designed around minismising disk I/O. I'm not going to merge this if it's on by default.

derrod avatar Dec 25 '24 11:12 derrod

The entire system was designed around minismising disk I/O. I'm not going to merge this if it's on by default.

From my perspective, this results in minimal I/O and primarily affects highly duplicated projects. The bigger issue is the enormous memory consumption. With this option, chunks remain in shared memory if we reuse different parts of the same chunk.

Additionally, I noticed that the actual memory consumption reported by the CLI during downloads is higher than the calculated values. This means that even if there seems to be enough memory based on calculations, the system could still hang. Therefore, keeping parts in memory for such edge cases seems like the wrong approach to me.

As a compromise, I've added an auto-retry mechanism for memory errors. Does this work for you?

Roman513 avatar Dec 25 '24 13:12 Roman513

From my perspective, this results in minimal I/O and primarily affects highly duplicated projects. The bigger issue is the enormous memory consumption. Chunks remain in shared memory if we reuse different parts of the same chunk multiple times.

It's only an issue for like 2 games where the devs did something stupid. It's not worth compromising for. According to the Steam hardware survey 16 and 32 GB are the most common configurations, so memory isn't really an issue.

Additionally, I noticed that the actual memory consumption reported by the CLI during downloads is higher than the calculated values. This means that even if there seems to be enough memory based on calculations, the system could still hang. Therefore, keeping parts in memory for such edge cases seems like the wrong approach to me.

The size calculated is only the minimum, not a limit. If the configured cache size is larger it'll still be used to the full extent if data isn't flushed to disk quickly enough.

As a compromise, I've added an auto-retry mechanism for memory errors. Does this work for you?

No, but I do want this to be a config option. But I can do that whenever I get around to cleaning up the commits and merging the PR.

derrod avatar Dec 25 '24 13:12 derrod

It's only an issue for like 2 games where the devs did something stupid. It's not worth compromising for. According to the Steam hardware survey 16 and 32 GB are the most common configurations, so memory isn't really an issue.

And I/O is, so badly? I guess people more eager to use such enormous amount of memory for something else. If we have really 2 games like this, fallback will almost never happen because it is enough shared memory. But it is not, because it became so annoying and even Heroic devs added fallback with re-running legendary cli with auto-increasing shared memory each time if we have memory error with default shared memory amount, which leads to manifests re-downloading and much slower.

https://github.com/Heroic-Games-Launcher/HeroicGamesLauncher/blob/395ffa73dd8e96f976e48092f1ee63ce30cdabd8/src/backend/storeManagers/legendary/games.ts#L637C1-L645C4

No, but I do want this to be a config option.

Fallback will improve user experience because users mostly use legendary with frontends and even are not able to see such errors. But even CLI users could suffer. Are you sure we want users to figure out which workaround from 3 available (increasing max memory, re-using files, or reordering) they need to apply? If this one surely works fast, why it cannot do a fallback in such cases? Also, not having a fallback will put some work to frontend developers to implement workarounds on their side.

Did I convince you to a fallback, or you still have arguments against it?

If not, please provide requirements for error messages to help users with choosing workaround, for config options (per game/global/etc/naming), and for CLI options naming if you aren't good with mine.

UPD: I've added global config option.

cleaning up the commits

You mean squashing commits into the one? I can do this then we'll come to final decision, or GitHub can do this for your on merging. Just let me know. I want this to be finished and do not stuck in unmerged forever.

Roman513 avatar Dec 25 '24 14:12 Roman513

@derrod Just a reminder. Cold you please review the last comment and let me know what is left to get this merged? BTW we got a report about another game with the same issue. Thanks!

Roman513 avatar Jan 02 '25 10:01 Roman513

@derrod Just a reminder. Cold you please review the last comment and let me know what is left to get this merged? BTW we got a report about another game with the same issue. Thanks!

Just need to find some time to re-review it, last week's been busy.

derrod avatar Jan 02 '25 10:01 derrod

I tested out this patch today with Legendary and it's working great for me. I was able to change from using --max-shared-memory 24576 to --read-files. My download for STASIS: BONE TOTEM went from 13 GiB of RAM usage down to 1 GiB.

LukeShortCloud avatar May 02 '25 17:05 LukeShortCloud

@derrod a reminder to review. actually it is merged to Heroic's fork anyway but maybe merging here will have other people.

Roman513 avatar Oct 18 '25 10:10 Roman513