harbor icon indicating copy to clipboard operation
harbor copied to clipboard

feat: support distributed lock implementation in execution status refresh

Open Coldchen99 opened this issue 1 month ago • 4 comments

Problem Description:

Currently, when multiple core instances scan the same set of keys simultaneously, they all end up performing redundant work (refreshing the same executions). This operation can easily overwhelm Redis's CPU resources.

Test Setup:

I conducted a test to demonstrate this issue with the following configuration:

  • Environment Configuration:
    • Set EXECUTION_STATUS_REFRESH_INTERVAL_SECONDS=10
  • Docker Compose Configuration
  core:
    image: goharbor/harbor-core:dev
    # container_name: harbor-core
    deploy:
      replicas: 5

Test Procedure:

Deployed 5 core replicas using docker compose up -d

Inserted 1,000,000 keys into the Redis container

All 5 instances began scanning execution status via: iter, err := cache.Default().Scan(ctx, "execution:id:*vendor:*status_outdate")

Test Results:

Before the fix:

Redis container's CPU usage spiked significantly (see screenshot below)

All 5 instances were processing the same keys redundantly

This cause performance degradation Clipboard_Screenshot_1763644718

After applying the code changes:

Redis CPU usage was significantly reduced

Instances now coordinate to avoid duplicate work

Performance improved Clipboard_Screenshot_1763644750

Conclusion:

The proposed changes effectively eliminate redundant execution status refreshes across multiple core instances, resulting in substantial CPU savings and improved overall system performance.

Thank you for contributing to Harbor!

Comprehensive Summary of your change

Issue being fixed

Fixes #(issue)

Please indicate you've done the following:

  • [x] Well Written Title and Summary of the PR
  • [ ] Label the PR as needed. "release-note/ignore-for-release, release-note/new-feature, release-note/update, release-note/enhancement, release-note/community, release-note/breaking-change, release-note/docs, release-note/infra, release-note/deprecation"
  • [x] Accepted the DCO. Commits without the DCO will delay acceptance.
  • [ ] Made sure tests are passing and test coverage is added if needed.
  • [ ] Considered the docs impact and opened a new docs issue or PR with docs changes if needed in website repository.

Coldchen99 avatar Nov 17 '25 12:11 Coldchen99

can you provide some info or problem your are trying to solve?

Vad1mo avatar Nov 18 '25 13:11 Vad1mo

can you provide some info or problem your are trying to solve?

Hi @Vad1mo I updated the PR description, please take a look

Coldchen99 avatar Nov 21 '25 03:11 Coldchen99

@Coldchen99 thanks for the PR, but it seems there's a conflict in the thoughts of the design ,it seems by design it encounrages the concurrent queries to redis.

cc @wy65701436 let's follow up.

reasonerjt avatar Nov 24 '25 08:11 reasonerjt

@Coldchen99 Thank you for your contribution. I understand your modifications and believe they can reduce CPU usage. However, the previous implementation took this aspect into account. The reason no lock was added was to use a shuffle-like algorithm to allow the execution state to be updated as early as possible. An implementation with locking might delay state updates and fail to achieve good load balancing. Therefore, if we really want to solve this performance issue, I am inclined to modify the current data structure, avoiding the key1:value1 approach and instead using a list or other structures to implement a queue-like effect. This would not only solve the current problem but also achieve good load balancing results. cc @wy65701436

chlins avatar Dec 01 '25 08:12 chlins