emqx-auth-http icon indicating copy to clipboard operation
emqx-auth-http copied to clipboard

server_unavaliable error

Open berayb opened this issue 3 years ago • 12 comments

Hi, the endpoint works and was working well already, upgraded EMQX to 4.2.9 and receiving the following error. However, the endpoint is accessible with the POST method.

2021-03-30 19:47:50.071 [error] <<"7e2f3776-61ac-49a6-b435-f870501aea74">>@176.40.36.64:14819 [Auth http] Request auth path: /mqtt/auth, error: timeout 2021-03-30 19:47:50.071 [warning] <<"7e2f3776-61ac-49a6-b435-f870501aea74">>@176.40.36.64:14819 [Channel] Client 7e2f3776-61ac-49a6-b435-f870501aea74 (Username: '7e2f3776-61ac-49a6-b435-f870501aea74') login failed for server_unavailable

berayb avatar Mar 30 '21 20:03 berayb

@berayb It seems that it has timed out waiting for the server to return a response.

tigercl avatar Apr 01 '21 01:04 tigercl

@tigercl, there is also a problem with different URLs for the rule engine. For example, logs say webhook.site is unreachable, but the different website is working.

We were running with no problem, but this is happening after the upgrade, and reverting to an older version also does not help.

Maybe possibility to ban some URLs for some reason?

berayb avatar Apr 03 '21 22:04 berayb

I have emqx running on docker and I get the same error every time the auth service is updated. As if emqx continues to try to talk to the old ip.

afzaal-ameer avatar Apr 09 '21 08:04 afzaal-ameer

@berayb Sorry for late reply. Can you provide a packet capture record?

tigercl avatar Apr 12 '21 03:04 tigercl

@afzaal-ameer Are you configuring a domain name? How do you judge that the old IP is connected?

tigercl avatar Apr 12 '21 03:04 tigercl

Hi @tigercl, we are now on Enterprise but the same problem happens. We validate by making a curl request and everything is working well. However, the broker does not even make a request therefore timeout happens.

[error] <<"XXX">>@8.45.42.165:64595 [Auth http] Request auth path: /mqtt/auth, error: timeout

path does not contain "https://domain" but only /mqtt/auth, is that correct?

UPDATE: Found a way to debug:

If we use the https://webhook.site for a test purpose, receiving the same error,

If we use the https://requestbin.com, the request performs well.

So, what would cause this, a block from Cloudflare maybe?

berayb avatar Apr 18 '21 03:04 berayb

@berayb If you find that the Broker did not make a request, then this timeout should be connect timeout.

tigercl avatar Apr 20 '21 06:04 tigercl

@tigercl, after digging into problem, we notice that problem occurs if Cloudflare proxy is enabled. This is problematic because many subdomain is using Cloudflare proxies. Could you please look at that? Thanks.

berayb avatar Apr 20 '21 06:04 berayb

@afzaal-ameer Are you configuring a domain name? How do you judge that the old IP is connected?

It's talking to another docker service on the same machine(no cloudflare involved). Docker services talk to each other using names. And similar to the experience of @berayb , I can login to the emqx container and curl to the auth service successfully. As a workaround, I've setup nginx with a static ip as a reverse proxy between emqx and the auth service and it works.

afzaal-ameer avatar Apr 21 '21 08:04 afzaal-ameer

I’ve encountered the same issue. My EMQX is throwing “service unavailable” error when authentication by HTTP after some idle time.

After a lot of time digging into this issue, I finally found the culprit and also found the way to work around it.

My setup:

  • EMQX 4.2.10 deployed on docker and connected to swarm overlay network
  • HTTP Auth Service deployed on docker swarm, exposed to EMQX using swarm VIP (not DNSrr)
  • Both EMQX and Auth Service are on the same host since my swarm cluster have only 1 node.

Reproduce:

  • Connect some client to EMQX: authenticate using HTTP auth successfully.
  • Wait for 15 minutes (without any action,…)
  • Connect other client to EMQX: failed to authenticate due to service unavailable.

Reason:

  • Auth HTTP plugin (Gun library to be exact) will create a pool of connection to Auth Service BEFORE use. It only creates connection without sending any data.
  • Docker swarm VIP mode using ipvs will expire TCP connection after 15 minutes (900s). After this time, if an auth request is made, a Reset flag will be sent back to caller (EMQX) to close old connection and establish another connection. But Reset flag make our auth request failed.

Discussion about this behavior:

I’ve tried all solutions in comment area (tweaking sysctls tcp keepalive options,...) but nothing works because we don’t use builtin tcp keepalive mechanism.

I believe not only docker swarm has this limitation but also other application/web server does. Here is some way to work around this issue:

myHttpServer.on('connection', (socket) => {
    socket.setTimeout(300000); // 5 minutes
    socket.on('timeout', () => {
        // Close idle socket
        socket.end();
    });
});
  • If you cannot change your Auth Service, consider place a proxy like Nginx before it to manage idle connection
  • Use DNSrr mode instead of swarm VIP to avoid 15 minutes limit

Closing idle connection after 5 minutes works for me. No service unavailable error anymore.

minhpq331 avatar Apr 27 '21 17:04 minhpq331

@tigercl do you have any update regarding the issue? I faced the similar error. I ran EMQX and HTTP auth backend in docker containers. EMQX version is 4.2.12.

([email protected])1> 2021-05-24 18:25:00.001 [warning] The process of gun exited due to {shutdown,ehostunreach}
([email protected])1> 2021-05-24 18:25:09.217 [error] <<"test.cloudnode">>@172.18.0.5:36306 [Auth http] Request auth path: /mqtt/auth, error: {shutdown,ehostunreach}
([email protected])1> 2021-05-24 18:25:09.218 [warning] <<"test.cloudnode">>@172.18.0.5:36306 [Channel] Client test.cloudnode (Username: 'undefined') login failed for server_unavailable
([email protected])1> 2021-05-24 18:25:09.218 [debug] <<"test.cloudnode">>@172.18.0.5:36306 [MQTT] SEND CONNACK(Q0, R0, D0, AckFlags=0, ReasonCode=3)
([email protected])1> 2021-05-24 18:25:09.218 [debug] <<"test.cloudnode">>@172.18.0.5:36306 [MQTT] Terminated due to {shutdown,server_unavailable}
([email protected])1> 2021-05-24 18:25:12.289 [warning] The process of gun exited due to {shutdown,ehostunreach}
([email protected])1> 2021-05-24 18:25:14.535 [debug] 172.18.0.5:36342 [MQTT] RECV <<16,80,0,4,77,81,84,84,4,44,1,44,0,17,49,48,55,48,76,84,67,46,99,108,111,117,100,110,111,100,101,0,46,36,83,89,83,47,98,114,111,107,101,114,47,99,111,110,110,101,99,116,105,111,110,47,49,48,55,48,76,84,67,46,99,108,111,117,100,110,111,100,101,47,115,116,97,116,101,0,1,48>>
([email protected])1> 2021-05-24 18:25:14.535 [debug] 172.18.0.5:36342 [MQTT] RECV CONNECT(Q0, R0, D0, ClientId=test.cloudnode, ProtoName=MQTT, ProtoVsn=4, CleanStart=false, KeepAlive=300, Username=undefined, Password=undefined, Will(Q1, R1, Topic=$SYS/broker/connection/test.cloudnode/state, Payload=<<"0">>))

I don't even see the requests coming to the http auth container, but this is 100% available. It worked properly with EMQX version 4.1.2.

sgrigorev avatar May 24 '21 18:05 sgrigorev

I changed my auth backend to listen to https instead of http and the error disappear.

sgrigorev avatar Jun 07 '21 06:06 sgrigorev