conan icon indicating copy to clipboard operation
conan copied to clipboard

[bug] Conan fails return error if server returns HTTP/502 (Bad Gateway) or HTTP/503 (Service Unavail)

Open Yurmusp opened this issue 4 years ago • 11 comments
trafficstars

My conan server frequently returns 502 (Bad Gateway) error. Sometimes 503 too. It's due to low memory I think. However, conan command in this case fails to return not zero.

In order to get to success. I run it several times in the loop. Like this:

for i in $(seq 1 5); do conan upload $PKG_NAME_FULL --all -r=psl-conan -c --retry 5 --retry-wait 15 && break sleep 15 && false done

This set of commands shall have failed status in the end if 502 or 503 error happens. However, in the log I'm getting this:

+for i in $(seq 1 5) +conan upload openssh/8.4p1 --all -r=psl-conan -c --retry 5 --retry-wait 15 Uploading to remote 'psl-conan': Uploading openssh/8.4p1 to remote 'psl-conan'
Compressing recipe... Recipe is up to date, upload skipped Uploading package 1/1: 141d689e77b3d2f372bf3bcf3398ce7b577a789f to 'psl-conan'
Compressing package... Uploading conan_package.tgz -> openssh/8.4p1:141d Waiting 15 seconds to retry... Waiting 15 seconds to retry... Uploading conaninfo.txt -> openssh/8.4p1:141d Uploading conanmanifest.txt -> openssh/8.4p1:141d ERROR: 502 Server Error: Bad Gateway for url: https://artifactory-cpp.dev.prismsl.net/api/conan/conan/v2/conans/openssh/8.4p1///revisions/79b6d3b2792a65b1e4d5a8f05804fe64/packages/141d689e77b3d2f372bf3bcf3398ce7b577a789f/revisions/bbd3ab2ef2e4d373cf5e43081acb86dd/files/conan_package.tgz ERROR: 503 Server Error: Service Unavailable for url: https://artifactory-cpp.dev.prismsl.net/api/conan/conan/v2/conans/openssh/8.4p1///revisions/79b6d3b2792a65b1e4d5a8f05804fe64/packages/141d689e77b3d2f372bf3bcf3398ce7b577a789f/revisions/bbd3ab2ef2e4d373cf5e43081acb86dd/files/conan_package.tgz +break The command "./upload.sh" exited with 0.

Environment Details (include every applicable attribute)

  • Operating System+version: Ubuntu 18.04
  • Compiler+version: GCC, but Not applicable
  • Conan version: 1.29.2
  • Python version: 3.6.9 Conan server: Artifactory Community Edition

Steps to reproduce (Include if Applicable)

See script above.

Logs (Executed commands with output) (Include/Attach if Applicable)

See above.

I think for some reason handling of these errors (502/503) is not done properly. Executable shall return not zero code if such error happens, but it returns zero for some reason.

Yurmusp avatar Jan 05 '21 04:01 Yurmusp

Hi @Yurmusp, Thanks a lot for reporting.

  • For the Artifactory errors, have you tried to change the JVM Parameters by setting JAVA_OPTIONS in $JFROG_HOME/artifactory/var/etc/system.yaml? If that's memory related maybe increasing the maximum size would help with that.
  • For the return code, I have not been able to reproduce the issue as for me the return code seems correct. Are you running the upload.sh in a CI environment like Jenkins or Travis? Running the conan command outside that environment is producing the same return code?

czoido avatar Jan 05 '21 13:01 czoido

Hi Carlos,

  1. I need to increase physical instance RAM. So, it's a bit more work and that's why it's not done yet. However, it's not the point of the issue and I think the server side is out of scope besides that it is failing in it's own way. I complain not about the server side, but client side.
  2. Return code is correct when everything is successful. However, when web request returns an error then this error is not propagated to proper process exit code. It's a bug in logic. Maybe not all http error codes are handled or exception is hidden or something similar.

Bug is definitely in conan client executed code and it's frustrating that it is not working properly.

I experience this code both in Travis (where CI build is) and in my vagrant vm when I build locally on my laptop. This bug is pretty consistent if the server side fails consistently. I would say it is predetermined. I did not notice unstable behavior. It stably fails on the client side if server side fails. I'v run into it many times.

In all cases everything is deployed by the same script in Ubuntu 18.04 VM.

On Tue, Jan 5, 2021 at 5:32 AM Carlos Zoido [email protected] wrote:

Hi @Yurmusp https://github.com/Yurmusp, Thanks a lot for reporting.

  • For the Artifactory errors, have you tried to change the JVM Parameters by setting JAVA_OPTIONS in $JFROG_HOME/artifactory/var/etc/system.yaml? If that's memory related maybe increasing the maximum size https://www.jfrog.com/confluence/display/JFROG/Installing+Artifactory would help with that.
  • For the return code, I have not been able to reproduce the issue as for me the return code seems correct. Are you running the upload.sh in a CI environment like Jenkins or Travis? Running the conan command outside that environment is producing the same return code?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/conan-io/conan/issues/8287#issuecomment-754638245, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQTGQWXKC6JLKIWZLXVXYTSYMIG7ANCNFSM4VUH5NJQ .

Yurmusp avatar Jan 05 '21 21:01 Yurmusp

Hi @Yurmusp

Sorry, it was not fully clear. Can you confirm that when you execute this command directly in the command line, without script, you don't get an error exit code?

$ conan upload $PKG_NAME_FULL --all -r=psl-conan -c

This is what we cannot reproduce, it seems that Conan is returning a non zero exist code, and is working correctly.

From the output you are copying above, it seems it is your script which is not exiting with an error code: The command "./upload.sh" exited with 0. This could be because the shell script doesn't seem to be forwarding the error code from Conan to the caller of the upload.sh script.

memsharded avatar Jan 05 '21 21:01 memsharded

Hi James,

In the end of conan command-line, please observe this part: ' && break '

This part only executes if conan was successful. i.e. exited with 0 exit code. In the original log above with shell tracing, you can clearly see that 'break' part of command is executed. It is shown as +break

'break' shall NOT execute in case if conan failed. When it fails, it shall do sleep 15 && false and when it does it last time it will make whole script to fail. This worked in multiple scenarios, but not with scenario I'm describing.

Another peculiarity here, is that recipe is not uploading (it's already up to date), but package is starting to upload. See log for details. May be this part complicates things and make it fail.

Yuriy

Yurmusp avatar Jan 05 '21 21:01 Yurmusp

Ok, thanks for the explanation, let's us have another look then.

memsharded avatar Jan 05 '21 21:01 memsharded

Hmm,

May be it's because of retry and one of retries succeeds... So, it prints error when it gets it, but finally succeeds. However, it's not clear if it is due to order of output. Which likely is not realtime as errors might go to stderr.

Yurmusp avatar Jan 05 '21 21:01 Yurmusp

Last scenario is harder to check.

Yurmusp avatar Jan 05 '21 21:01 Yurmusp

I have been able to reproduce the behavior above:

def test_upload_error_503():
    # https://github.com/conan-io/conan/issues/8287
    class Connection503Error(TestRequester):
        counter = 0

        def put(self, *args, **kwargs):
            self.counter += 1
            if self.counter == 3:
                from requests.exceptions import HTTPError
                raise HTTPError(503, "This is an error msg!!!")
            else:
                return super(Connection503Error, self).put(*args, **kwargs)

    client = TestClient(requester_class=Connection503Error, default_server_user=True)
    client.save({"conanfile.py": GenConanfile()})
    client.run('user -p password -r default user')

    client.run('create . lib/1.0@user/channel')
    client.run('upload * -c --all -r default --retry-wait=0')
    print(client.out)
    assert "ERROR: [Errno 503] This is an error msg!!!" in client.out

It is that, one of the retries fails, but the next is able to succeed, so the whole command is considered successful and exits with return code 0.

You are right it is printing some "Error" messages, even if the final result is not an error. This is a known issue, but can't be easily changed at the moment (Conan 1.X), without breaking, but it will definitely be changed in Conan 2.0 (some work has already be done to solve this).

If this is true, it is not a bug, it is behaving correctly, and it would be a UX issue to improve, sending it to stderr instead (already planned for Conan 2.0)

memsharded avatar Jan 05 '21 22:01 memsharded

Ok. Thanks. Sorry for bothering. Did not understand it right away.

On Tue, Jan 5, 2021 at 2:24 PM James [email protected] wrote:

I have been able to reproduce the behavior above:

def test_upload_error_503(): # https://github.com/conan-io/conan/issues/8287 class Connection503Error(TestRequester): counter = 0

    def put(self, *args, **kwargs):
        self.counter += 1
        if self.counter == 3:
            from requests.exceptions import HTTPError
            raise HTTPError(503, "This is an error msg!!!")
        else:
            return super(Connection503Error, self).put(*args, **kwargs)

client = TestClient(requester_class=Connection503Error, default_server_user=True)
client.save({"conanfile.py": GenConanfile()})
client.run('user -p password -r default user')

client.run('create . lib/1.0@user/channel')
client.run('upload * -c --all -r default --retry-wait=0')
print(client.out)
assert "ERROR: [Errno 503] This is an error msg!!!" in client.out

It is that, one of the retries fails, but the next is able to succeed, so the whole command is considered successful and exits with return code 0.

You are right it is printing some "Error" messages, even if the final result is not an error. This is a known issue, but can't be easily changed at the moment (Conan 1.X), without breaking, but it will definitely be changed in Conan 2.0 (some work has already be done to solve this).

If this is true, it is not a bug, it is behaving correctly, and it would be a UX issue to improve, sending it to stderr instead (already planned for Conan 2.0)

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/conan-io/conan/issues/8287#issuecomment-754938896, or unsubscribe https://github.com/notifications/unsubscribe-auth/AFQTGQTD3DQ2RC4B7XBZBKLSYOGQ5ANCNFSM4VUH5NJQ .

Yurmusp avatar Jan 06 '21 00:01 Yurmusp

I ran into a similar issue (periodical 502 or 503 HTTP errors in CI jobs). Artifactory docs indicate that it handles at most 200 active HTTP connections by default:

Artifactory runs on Tomcat which uses the Apache HTTP client to handle HTTP requests. Tomcat maintains a pool of threads and allocates a single thread to handle each incoming request. As the pool of threads gets depleted, Tomcat will create more (up to the maximum specified by the maxThreads attribute) to handle additional requests that come in. If the number of requests exceeds the maxThreads value, requests will be queued (up to the maximum specified by the acceptCount attribute). Should this parameter be too high, requests waiting for a free thread may experience delays until threads become available to handle them. Should the acceptCount limit also be reached, Tomcat will throw “connection refused” errors for any additional requests until the required resources are eventually released. In turn, any new request, uploads, downloads, updates, etc., which could not acquire a thread, will fail. To avoid these performance hits and errors, you can increase the values of these attributes in the Tomcat’s server.xml file. The default value for maxThreads is 200, while acceptCount is set to 100.

Using monitoring method described in https://jfrog.com/blog/monitoring-and-optimizing-artifactory-performance/ I see that conan search -r creates 12 concurrent HTTP connections while single conan install generates over 300(!) connections:

Sat Jul 23 05:08:00 UTC 2022  HTTP Connections: 5
Sat Jul 23 05:08:05 UTC 2022  HTTP Connections: 48
Sat Jul 23 05:08:10 UTC 2022  HTTP Connections: 71
Sat Jul 23 05:08:15 UTC 2022  HTTP Connections: 114
Sat Jul 23 05:08:20 UTC 2022  HTTP Connections: 178
Sat Jul 23 05:08:25 UTC 2022  HTTP Connections: 192
Sat Jul 23 05:08:30 UTC 2022  HTTP Connections: 308
Sat Jul 23 05:08:35 UTC 2022  HTTP Connections: 376
Sat Jul 23 05:08:40 UTC 2022  HTTP Connections: 381
Sat Jul 23 05:08:45 UTC 2022  HTTP Connections: 381
Sat Jul 23 05:08:50 UTC 2022  HTTP Connections: 381
Sat Jul 23 05:08:55 UTC 2022  HTTP Connections: 381
Sat Jul 23 05:09:00 UTC 2022  HTTP Connections: 380
Sat Jul 23 05:09:05 UTC 2022  HTTP Connections: 342
Sat Jul 23 05:09:10 UTC 2022  HTTP Connections: 309
Sat Jul 23 05:09:15 UTC 2022  HTTP Connections: 268
Sat Jul 23 05:09:20 UTC 2022  HTTP Connections: 228
Sat Jul 23 05:09:25 UTC 2022  HTTP Connections: 204
Sat Jul 23 05:09:30 UTC 2022  HTTP Connections: 110
Sat Jul 23 05:09:35 UTC 2022  HTTP Connections: 6
Sat Jul 23 05:09:40 UTC 2022  HTTP Connections: 1

I have general.parallel_download=8 in Conan but 380 connections seems to be to much.

db4 avatar Jul 23 '22 05:07 db4

Hi @db4

I think this high number of connections would deserve its own issue. Would you like to create a new issue with the above, please? It would be useful to detail how did you obtain those traces, and also to test the same, but without general.parallel_download. Also, detail the conan version, just in case, and any other detail that could be relevant, if you have some non-standard installation, or something like that, both in the client or in the server. Thanks!

memsharded avatar Jul 25 '22 15:07 memsharded

@memsharded I think I've sorted it down. It's not Conan client but nginx that I use as a HTTP redirector to the actual Artifactory instance. For the record, I've configured it the following way:

upstream artifactory_backend {
    server artifactory1.my_internal_domain:8082;
}

server {
    server_name artifactory.my_internal_domain;
    client_max_body_size 1G;
    location / {
        resolver dns.my_internal_domain ipv6=off;
        proxy_buffering off;
        proxy_request_buffering off;
        proxy_pass http://artifactory_backend;
        proxy_set_header        Host    $host;
        keepalive_timeout 0;
    }
}

nginx seems to generate a lot of concurrent HTTP connections...

db4 avatar Sep 29 '22 07:09 db4

Thanks for the feedback @db4 and following up. I am not nginx expert, difficult for me to help there.

Couple of possibilities:

  • Maybe @davidsanfal knows a bit more than me, can you please have a quick look?
  • If you use an Artifactory that has included support, I'd submit a ticket to that support, because this sounds mostly an advanced Artifactory setup issue.

memsharded avatar Sep 29 '22 08:09 memsharded

@memsharded The Recommended Nginx Reverse Proxy Configuration for Artifactory 7 seems to works for me without issues. I wish I looked into the Jfrog knowledge base before asking questions here...

db4 avatar Oct 07 '22 13:10 db4

As the above issue has been solved in 2.0 (messages are directed to stderr by default), and the recent issue seems to be solved with Artifactory recommended configuration, I think this issue can be closed, thanks!

memsharded avatar Oct 07 '22 14:10 memsharded