Implement cache-friendly package URLs
Pulp_rpm currently produces package urls that are relative only to the RPM filename, for example:
Packages/a/abseil-cpp-20240116.0-1.azl3.x86_64.rpm.
This is works fine, until someone comes along and republishes a different build of the package with the same NEVRA (yes, I know that's a stupid idea. Don't @ me.).
Once that happens, any server-side / CDN caches that are holding that RPM are suddenly invalid, and:
- You must purge them.
- There will be some amount of time between the publication of the updated repodata and the purge of the caches taking effect, during which clients will receive the new metadata but old package version and get checksum-mismatch errors.
Describe the solution you'd like
It seems entirely feasible for pulp_rpm to instead (or in addition to) produce package urls that are relative to both filename and sha256sum (or at least part of it). If we suppose for example that:
$ sha256sum abseil-cpp-20240116.0-1.azl3.x86_64.rpm
495321a570638743464707c8d6a7e433e8afb0f46083ef7ba2dbf0955bd30eb6 abseil-cpp-20240116.0-1.azl3.x86_64.rpm
then the new url could be
Packages/49/5321a5/abseil-cpp-20240116.0-1.azl3.x86_64.rpm.
The clients don't care, they'll fetch whatever url the repodata tells them to. And this way cache invalidation becomes completely unnecessary, because even if someone republished the same NEVRA the url wouldn't collide, and users would see no errors.
And if pulp_contant was able to map both urls back to the package then that instantly solves backwards-compatibility for the upgrade, as well as providing a more human-friendly path for finding the rpm, as exists today. As long as the metadata references the cache-friendly url then the cache problems are solved from my perspective.
Additional context This is a feature that we'd be interested in developing and contributing up. I'm not married to example url provided, and am open to feedback on that and on implementation details.
I'm primarily interested in this feature as a publisher of original content, however it seems to me that it would also help Pulp differentiate itself in the case where you're syncing / mirroring content from some other upstream. Because then even if those bozos do something stupid, everything on your end will still end up working properly.
It seems likely to me that implementing this feature would require some changes in pulpcore, so if I should move the feature request there let me know.
This probably also solves the edge-case of a package with the same NVRA but different Epoch being published in the same repo, though I have not checked if that's really an issue. It would be a rare one in any case.
Another possible reason a package might want to be republished with the same NEVRA is signing / resigning issues.
Hey @sdherr, we talked about this in the RPM meeting and it makes complete sense. There is already a layout option for repository publications which this feature could slot right into as another layout option - flat, nested-alphabetically, nested-by-hash. So it shouldn't be too difficult to implement.
One potential issue: packages that were synced on-demand are not guaranteed to have a sha256 hash available, so we would need to think about that a little bit.