distribution
distribution copied to clipboard
Add concurrency to tag Lookup
This was to address performance issues when deleting images in repos with a large number of tagstore
https://github.com/distribution/distribution/issues/3525
Signed-off-by: jack-baines [email protected]
Codecov Report
Merging #3589 (76c29bf) into main (b609265) will increase coverage by
0.22%
. The diff coverage isn/a
.
@@ Coverage Diff @@
## main #3589 +/- ##
==========================================
+ Coverage 56.33% 56.56% +0.22%
==========================================
Files 101 101
Lines 7313 7354 +41
==========================================
+ Hits 4120 4160 +40
- Misses 2536 2537 +1
Partials 657 657
Impacted Files | Coverage Δ | |
---|---|---|
...ribution/distribution/registry/storage/tagstore.go | 81.63% <0.00%> (+6.16%) |
:arrow_up: |
Continue to review full report at Codecov.
Legend - Click here to learn more
Δ = absolute <relative> (impact)
,ø = not affected
,? = missing data
Powered by Codecov. Last update a4d9db5...76c29bf. Read the comment docs.
This PR fixes 1 of the 2 things described in the issue which is that we have to look up every tag in the repo to detect whether it matches the digest being deleted and needs to be untagged and previously this was done one tag at a time.
The 2nd bit has much lower impact from manual testing and is that once we have identified which tags match the digest we untag them sequentially. My view was this second change can be done standalone at a later date.
My testing on a a relatively small repo showed a 3x improvement so deleting an image in a repo with 500 tags took ~6s down from ~18s.
Final thing to point out is that I have hardcoded the concurrency limit to 10 obviously may be some advantages to that being configurable but wasn't entirely sure on how that could be achieved, so if that is deemed required would appreciate a pointer of a similar code that gets config into something like the tag store.
Just wondering since we are addressing these slow deletes...this code in S3 delete driver could use a bit of TLC, too. If the repo contains a lot of data, we end up growing s3Objects
slice into huge size; the actual deletes are batched, though that is of little help since we must first go through all contents of the repo.
LGTM...
800TB files in ceph ,its' hard to delete. patch it by this pr
delete file with 3TB / min
Is there any news on this?
Ping @bainsy88
Friendly ping @bainsy88
There is also https://github.com/distribution/distribution/pull/3890 which is more comprehensive.
We need to decide which one to keep focussing on.
If https://github.com/distribution/distribution/pull/3890 is more comprehensive and more (recently) active then I’d say it’s safe to close this PR (also given that the author hasn’t responded to the reviews). As always, feel free to re-open this PR.