harbor icon indicating copy to clipboard operation
harbor copied to clipboard

504 Gateway Time-out when replication use pull mode

Open lf1029698952 opened this issue 5 years ago • 32 comments

Source Harbor version 1.7.1 Dest Harbor version 1.8.0

When I use pull mode to make regular backups, the source harbor repo have too many images tags, probably above tens of thousands, I get the following errors:

image

image

replication rule config: image

and the other replication is normal, when start replica by pull mode, dest harbor will list all tags of soruce harbor repos, the source harbor api response too slow. I will modify nginx timeout config to retry.

lf1029698952 avatar Jun 13 '19 02:06 lf1029698952

The perf issue of listing tags is a known issue. We're working on to figure out a proper solution to improve it.

cc @ywk253100

steven-zou avatar Jun 14 '19 10:06 steven-zou

@cd1989 Do we have any solution to fix this as I noticed you added it into the 1.9 scope

ywk253100 avatar Aug 19 '19 08:08 ywk253100

@cd1989 Do we have any solution to fix this as I noticed you added it into the 1.9 scope

Not yet, but I want to work on it in 1.9 scope.

cd1989 avatar Aug 20 '19 03:08 cd1989

@lf1029698952 The timeout error happened between the nginx and core on the source Harbor, changing the default timeout in nginx should be a workaround.

@cd1989 As the error happened on the 1.7 Harbor, I don't think we any solution to fix it in 1.9. Moving it out of 1.9

ywk253100 avatar Aug 20 '19 05:08 ywk253100

I have change nginx timeout config to 10min, but the job will cost 10min and error, I suspect that the performance of the harbor API is so bad. This is an urgent problem to be solved. Thanks.

lf1029698952 avatar Aug 20 '19 06:08 lf1029698952

@lf1029698952 You need to change the timeout to large enough to let the API calls complete

ywk253100 avatar Aug 21 '19 08:08 ywk253100

And some refactor is under designing, after that, the issue for listing tag API will disappear

ywk253100 avatar Aug 21 '19 08:08 ywk253100

Please double check if this issue still exists in 2.x.

This is probably fixed after we store tag info in DB instead of registry.

reasonerjt avatar May 25 '20 04:05 reasonerjt

Per the above comment, @lf1029698952 you can try with the latest Harbor(v2.5), and performance should be gone.

wy65701436 avatar Apr 13 '22 09:04 wy65701436

I have just tested it on v2.5. We still experience 504 errors. The strange thing is that it only occours on the grafana replication rule. We have about 50 replication rules that pull images from Docker. We only experience this problem with the grafana images. I used skopeo inspect to find the right location. It returns docker.io/grafana/grafana. Is this problem related?

image image

2022-04-20T14:44:54Z [INFO] [/pkg/reg/adapter/dockerhub/client.go:93]: GET https://hub.docker.com/v2/repositories/grafana/?page=1&page_size=100
2022-04-20T14:45:25Z [ERROR] [/pkg/reg/adapter/dockerhub/adapter.go:410]: list repos error: 504 -- <html><body><h1>504 Gateway Time-out</h1>
The server didn't respond in time.
</body></html>

gwiersma avatar Apr 19 '22 10:04 gwiersma

We have the same problem, we were able to narrow it down to the page size in the dockerhub adapter

diff --git a/src/pkg/reg/adapter/dockerhub/adapter.go b/src/pkg/reg/adapter/dockerhub/adapter.go
index 0ff2fca65..6f6e8c7e3 100644
--- a/src/pkg/reg/adapter/dockerhub/adapter.go
+++ b/src/pkg/reg/adapter/dockerhub/adapter.go
@@ -254,7 +254,7 @@ func (a *adapter) FetchArtifacts(filters []*model.Filter) ([]*model.Resource, er
        log.Debugf("got %d namespaces", len(namespaces))
        for _, ns := range namespaces {
                page := 1
-               pageSize := 100
+               pageSize := 50
                n := 0
                for {
                        pageRepos, err := a.getRepos(ns, "", page, pageSize)
@@ -295,7 +295,7 @@ func (a *adapter) FetchArtifacts(filters []*model.Filter) ([]*model.Resource, er

                        var tags []string
                        page := 1
-                       pageSize := 100
+                       pageSize := 50
                        for {
                                pageTags, err := a.getTags(repo.Namespace, repo.Name, page, pageSize)
                                if err != nil {

the 504 is on the dockerhub api and occurs if the query is longer then 30 seconds

NemesisRE avatar May 09 '22 12:05 NemesisRE

We are facing the same timeout issue with some dockerhub replication rules (grafana/loki). Is there any workaround?

HammerNL89 avatar Jun 07 '22 15:06 HammerNL89

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] avatar Jul 08 '22 09:07 github-actions[bot]

We just faced the same issue on a Harbor->Harbor replication:

failed to fetch artifacts: failed to list artifacts of repository 'dev/jenkins/network-defense-manager': http error: code 504, message <html> <head><title>504 Gateway Time-out</title></head> <body> <center><h1>504 Gateway Time-out</h1></center> <hr><center>nginx</center> </body> </html>

This also caused an incident in target Harbor as it was brought down with a big increase in DB locks (the 2 arrows mark the start of two tries in triggering replication in the replication source Harbor instance): image

aitorpazos avatar Aug 05 '22 12:08 aitorpazos

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] avatar Oct 05 '22 09:10 github-actions[bot]

Hi bot, this is not resolved AFAIK.

aitorpazos avatar Oct 08 '22 14:10 aitorpazos

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] avatar Dec 09 '22 09:12 github-actions[bot]

Hi bot, this is not resolved AFAIK.

aitorpazos avatar Dec 30 '22 19:12 aitorpazos

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] avatar Mar 01 '23 09:03 github-actions[bot]

Hi bot, this is not resolved AFAIK.

aitorpazos avatar Mar 01 '23 14:03 aitorpazos

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] avatar May 02 '23 09:05 github-actions[bot]

Not resolved AFAIK

aitorpazos avatar May 24 '23 12:05 aitorpazos

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] avatar Jul 25 '23 09:07 github-actions[bot]

Not resolved AFAIK

aitorpazos avatar Jul 25 '23 09:07 aitorpazos

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] avatar Sep 24 '23 09:09 github-actions[bot]

Not resolved AFAIK

aitorpazos avatar Sep 25 '23 16:09 aitorpazos

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] avatar Nov 26 '23 09:11 github-actions[bot]

Not resolved AFAIK

aitorpazos avatar Nov 30 '23 10:11 aitorpazos

This issue is being marked stale due to a period of inactivity. If this issue is still relevant, please comment or remove the stale label. Otherwise, this issue will close in 30 days.

github-actions[bot] avatar Jan 31 '24 09:01 github-actions[bot]

Not resolved AFAIK

aitorpazos avatar Jan 31 '24 11:01 aitorpazos