High memory consumption when syncing a Oracle remote repository.
Version "rpm": "3.31.2" "core": "3.84.1"
Describe the bug
When syncing the https://yum.oracle.com/repo/OracleLinux/OL9/developer/x86_64/ repository, the memory consumption can reach 10GB.
To Reproduce Create a remote pointing to the Oracle repository, associate it with a local repository, and ask Pulp to sync it.
Additional context
I'm attaching a memray flamegraph profile in the hope it helps to deal with the issue.
A separate instance of the same issue, but relating to a Google repository:
Describe the bug When syncing the https://packages.cloud.google.com/yum/repos/cloud-sdk-el9-x86_64/ repository, the memory consumption can reach 6GB.
To Reproduce Create a remote pointing to the Google repository, associate it with a repository, and ask Pulp to sync it.
Additional context
I'm attaching a memray flamegraph profile in the hope it helps to deal with the issue.
memray-flamegraph-memray_profile.html
A separate instance of the same issue, but relating to an ElasticSearch repository:
Describe the bug When syncing the https://artifacts.elastic.co/packages/8.x/yum repository, the memory consumption can reach 6GB.
A workaround that will work in most cases: Instead of using a mirror sync mode, use additive sync in combination with the retain_package_versions option. Example: with retain_package_versions=3
A document on various possible strategies for sync memory usage reduction: https://hackmd.io/6fwkDMOXRamBz27d5CM-IQ