emqx-auth-http
emqx-auth-http copied to clipboard
server_unavaliable error
Hi, the endpoint works and was working well already, upgraded EMQX to 4.2.9 and receiving the following error. However, the endpoint is accessible with the POST method.
2021-03-30 19:47:50.071 [error] <<"7e2f3776-61ac-49a6-b435-f870501aea74">>@176.40.36.64:14819 [Auth http] Request auth path: /mqtt/auth, error: timeout 2021-03-30 19:47:50.071 [warning] <<"7e2f3776-61ac-49a6-b435-f870501aea74">>@176.40.36.64:14819 [Channel] Client 7e2f3776-61ac-49a6-b435-f870501aea74 (Username: '7e2f3776-61ac-49a6-b435-f870501aea74') login failed for server_unavailable
@berayb It seems that it has timed out waiting for the server to return a response.
@tigercl, there is also a problem with different URLs for the rule engine. For example, logs say webhook.site is unreachable, but the different website is working.
We were running with no problem, but this is happening after the upgrade, and reverting to an older version also does not help.
Maybe possibility to ban some URLs for some reason?
I have emqx running on docker and I get the same error every time the auth service is updated. As if emqx continues to try to talk to the old ip.
@berayb Sorry for late reply. Can you provide a packet capture record?
@afzaal-ameer Are you configuring a domain name? How do you judge that the old IP is connected?
Hi @tigercl, we are now on Enterprise but the same problem happens. We validate by making a curl request and everything is working well. However, the broker does not even make a request therefore timeout happens.
[error] <<"XXX">>@8.45.42.165:64595 [Auth http] Request auth path: /mqtt/auth, error: timeout
path does not contain "https://domain" but only /mqtt/auth, is that correct?
UPDATE: Found a way to debug:
If we use the https://webhook.site for a test purpose, receiving the same error,
If we use the https://requestbin.com, the request performs well.
So, what would cause this, a block from Cloudflare maybe?
@berayb If you find that the Broker did not make a request, then this timeout should be connect timeout.
@tigercl, after digging into problem, we notice that problem occurs if Cloudflare proxy is enabled. This is problematic because many subdomain is using Cloudflare proxies. Could you please look at that? Thanks.
@afzaal-ameer Are you configuring a domain name? How do you judge that the old IP is connected?
It's talking to another docker service on the same machine(no cloudflare involved). Docker services talk to each other using names. And similar to the experience of @berayb , I can login to the emqx container and curl to the auth service successfully. As a workaround, I've setup nginx with a static ip as a reverse proxy between emqx and the auth service and it works.
I’ve encountered the same issue. My EMQX is throwing “service unavailable” error when authentication by HTTP after some idle time.
After a lot of time digging into this issue, I finally found the culprit and also found the way to work around it.
My setup:
- EMQX 4.2.10 deployed on docker and connected to swarm overlay network
- HTTP Auth Service deployed on docker swarm, exposed to EMQX using swarm VIP (not DNSrr)
- Both EMQX and Auth Service are on the same host since my swarm cluster have only 1 node.
Reproduce:
- Connect some client to EMQX: authenticate using HTTP auth successfully.
- Wait for 15 minutes (without any action,…)
- Connect other client to EMQX: failed to authenticate due to service unavailable.
Reason:
- Auth HTTP plugin (Gun library to be exact) will create a pool of connection to Auth Service BEFORE use. It only creates connection without sending any data.
- Docker swarm VIP mode using ipvs will expire TCP connection after 15 minutes (900s). After this time, if an auth request is made, a Reset flag will be sent back to caller (EMQX) to close old connection and establish another connection. But Reset flag make our auth request failed.
Discussion about this behavior:
I’ve tried all solutions in comment area (tweaking sysctls tcp keepalive options,...) but nothing works because we don’t use builtin tcp keepalive mechanism.
I believe not only docker swarm has this limitation but also other application/web server does. Here is some way to work around this issue:
- Properly set “Keep alive HTTP header” timeout and max to close connection AFTER request https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Keep-Alive
- Close idle connections (which open BEFORE request but doesn't have any activity) in your Auth Service. Here is my example using nodejs
myHttpServer.on('connection', (socket) => {
socket.setTimeout(300000); // 5 minutes
socket.on('timeout', () => {
// Close idle socket
socket.end();
});
});
- If you cannot change your Auth Service, consider place a proxy like Nginx before it to manage idle connection
- Use DNSrr mode instead of swarm VIP to avoid 15 minutes limit
Closing idle connection after 5 minutes works for me. No service unavailable error anymore.
@tigercl do you have any update regarding the issue? I faced the similar error. I ran EMQX and HTTP auth backend in docker containers. EMQX version is 4.2.12.
([email protected])1> 2021-05-24 18:25:00.001 [warning] The process of gun exited due to {shutdown,ehostunreach}
([email protected])1> 2021-05-24 18:25:09.217 [error] <<"test.cloudnode">>@172.18.0.5:36306 [Auth http] Request auth path: /mqtt/auth, error: {shutdown,ehostunreach}
([email protected])1> 2021-05-24 18:25:09.218 [warning] <<"test.cloudnode">>@172.18.0.5:36306 [Channel] Client test.cloudnode (Username: 'undefined') login failed for server_unavailable
([email protected])1> 2021-05-24 18:25:09.218 [debug] <<"test.cloudnode">>@172.18.0.5:36306 [MQTT] SEND CONNACK(Q0, R0, D0, AckFlags=0, ReasonCode=3)
([email protected])1> 2021-05-24 18:25:09.218 [debug] <<"test.cloudnode">>@172.18.0.5:36306 [MQTT] Terminated due to {shutdown,server_unavailable}
([email protected])1> 2021-05-24 18:25:12.289 [warning] The process of gun exited due to {shutdown,ehostunreach}
([email protected])1> 2021-05-24 18:25:14.535 [debug] 172.18.0.5:36342 [MQTT] RECV <<16,80,0,4,77,81,84,84,4,44,1,44,0,17,49,48,55,48,76,84,67,46,99,108,111,117,100,110,111,100,101,0,46,36,83,89,83,47,98,114,111,107,101,114,47,99,111,110,110,101,99,116,105,111,110,47,49,48,55,48,76,84,67,46,99,108,111,117,100,110,111,100,101,47,115,116,97,116,101,0,1,48>>
([email protected])1> 2021-05-24 18:25:14.535 [debug] 172.18.0.5:36342 [MQTT] RECV CONNECT(Q0, R0, D0, ClientId=test.cloudnode, ProtoName=MQTT, ProtoVsn=4, CleanStart=false, KeepAlive=300, Username=undefined, Password=undefined, Will(Q1, R1, Topic=$SYS/broker/connection/test.cloudnode/state, Payload=<<"0">>))
I don't even see the requests coming to the http auth container, but this is 100% available. It worked properly with EMQX version 4.1.2.
I changed my auth backend to listen to https instead of http and the error disappear.