feat: support distributed lock implementation in execution status refresh
Problem Description:
Currently, when multiple core instances scan the same set of keys simultaneously, they all end up performing redundant work (refreshing the same executions). This operation can easily overwhelm Redis's CPU resources.
Test Setup:
I conducted a test to demonstrate this issue with the following configuration:
- Environment Configuration:
- Set EXECUTION_STATUS_REFRESH_INTERVAL_SECONDS=10
- Docker Compose Configuration
core:
image: goharbor/harbor-core:dev
# container_name: harbor-core
deploy:
replicas: 5
Test Procedure:
Deployed 5 core replicas using docker compose up -d
Inserted 1,000,000 keys into the Redis container
All 5 instances began scanning execution status via:
iter, err := cache.Default().Scan(ctx, "execution:id:*vendor:*status_outdate")
Test Results:
Before the fix:
Redis container's CPU usage spiked significantly (see screenshot below)
All 5 instances were processing the same keys redundantly
This cause performance degradation
After applying the code changes:
Redis CPU usage was significantly reduced
Instances now coordinate to avoid duplicate work
Performance improved
Conclusion:
The proposed changes effectively eliminate redundant execution status refreshes across multiple core instances, resulting in substantial CPU savings and improved overall system performance.
Thank you for contributing to Harbor!
Comprehensive Summary of your change
Issue being fixed
Fixes #(issue)
Please indicate you've done the following:
- [x] Well Written Title and Summary of the PR
- [ ] Label the PR as needed. "release-note/ignore-for-release, release-note/new-feature, release-note/update, release-note/enhancement, release-note/community, release-note/breaking-change, release-note/docs, release-note/infra, release-note/deprecation"
- [x] Accepted the DCO. Commits without the DCO will delay acceptance.
- [ ] Made sure tests are passing and test coverage is added if needed.
- [ ] Considered the docs impact and opened a new docs issue or PR with docs changes if needed in website repository.
can you provide some info or problem your are trying to solve?
can you provide some info or problem your are trying to solve?
Hi @Vad1mo I updated the PR description, please take a look
@Coldchen99 thanks for the PR, but it seems there's a conflict in the thoughts of the design ,it seems by design it encounrages the concurrent queries to redis.
cc @wy65701436 let's follow up.
@Coldchen99 Thank you for your contribution. I understand your modifications and believe they can reduce CPU usage. However, the previous implementation took this aspect into account. The reason no lock was added was to use a shuffle-like algorithm to allow the execution state to be updated as early as possible. An implementation with locking might delay state updates and fail to achieve good load balancing. Therefore, if we really want to solve this performance issue, I am inclined to modify the current data structure, avoiding the key1:value1 approach and instead using a list or other structures to implement a queue-like effect. This would not only solve the current problem but also achieve good load balancing results. cc @wy65701436