ContainerNursery icon indicating copy to clipboard operation
ContainerNursery copied to clipboard

Not proxying properly once the container is active [ARM/*arr]

Open kn-f opened this issue 2 years ago • 5 comments

Hello,

When a *arr (Prowlarr, Sonarr....) container is "revived" for the second time, Container Nursery does not redirect to the application even if it's up an running properly. The debug message shows that the application returns a 401 - Unauthorized to Container Nursery and as such the proxy does not recognize the application as "up and running".

In my opinion (open to discussion) as long as Conainer Nursery receives a "valid" HTTP response from the web server (from 100 to 499?) the proxy should recognize the application as "up and running" and thus pass the control to the application itself.

I did a small test, changing this line in the ProxyHosts file: if (res.status === 200 || (res.status >= 300 && res.status <= 399)) { to if (res.status === 200 || (res.status >= 300 && res.status <= 499)) {

and it works like a charm.

An option would be to make this range configurable by the user either at ContainerNursery level or at individual container level in case widening the range is not in line with the design of the application.

kn-f avatar Sep 05 '22 19:09 kn-f

This is a tough one, I was never 100% happy with the current implementation of this. The aim was to make sure the redirect only happens after the application is truly ready (so the user doesn't get a Gateway Timeout or something similar and has to reload by hand after a few seconds), which in most cases should be 200 or a redirect 300-399. I'm open to adding more status codes to the list, so we should maybe gather a list of ranges/codes which should (if the application is designed correctly) correspond with a running & ready application.

After looking through the HTTP status codes list I came up with the following:

  • 200
  • 201
  • 300-399
  • 401

Other suggestions?

ItsEcholot avatar Sep 06 '22 06:09 ItsEcholot

Based on the HTTP status codes, I'd advise to:

  • 1xx: Accept, these are informational responses implying that the container is up and running
  • 2xx: Accept, these are successful responses, as such we're pretty sure the container is up and running
  • 3xx: Accept, these are redirections, possibly ContainerNursery is contacting the "wrong" endpoint but the container should be up and running
  • 4xx: Accept, these are client errors. Maybe Container Nuersery is contacting the wrong endpoint or the session that the client is supposed to be open has been forcibly closed by shutting down the container
  • 5xx: Reject, these are server side errors implying that either the container is not fully active or that there's an error in the container

Said that, maybe the project could keep the original list of valid status codes with the possibility to specify an endpoint in the config file to check if the container is properly up and running. i.e. Prowlarr has an endpoint to check the server health: ​/api​/v1​/health

kn-f avatar Sep 07 '22 06:09 kn-f

I'm hesitant to add the 4xx range since, you're right, this should be a client error. But some on launch building onepage applications might return 404 while the index.html isn't built yet (as an example dashy comes to mind), since the nginx/apache webserver is already running. Also 2xx codes like 204 (No Content) should probably not lead to success?

So I think the following: 100-199, 200, 201, 300-399, 401

ItsEcholot avatar Sep 07 '22 08:09 ItsEcholot

Regarding the health endpoint, maybe we should be thinking about adding docker healthcheck support. This way we don't need to build our own solution and increase compatibility as well.

ItsEcholot avatar Sep 07 '22 12:09 ItsEcholot

I'm hesitant to add the 4xx range since, you're right, this should be a client error. But some on launch building onepage applications might return 404 while the index.html isn't built yet (as an example dashy comes to mind), since the nginx/apache webserver is already running. Also 2xx codes like 204 (No Content) should probably not lead to success?

So I think the following: 100-199, 200, 201, 300-399, 401

I wasn't considering this so definitively extending to all the 4XX codes is a "no go". With respect to the next comment, not all docker containers implement health checks as well as not all applications implement API to check the health of the service. Whatever of the two solution is chosen there will be a drawback. Personally I'd vote for using the HEALTHCHECK functionality in docker.

As a quick fix I'd currently keep the codes you mentioned before and continue the discussion/feature request to see which is the best long term approach.

kn-f avatar Sep 07 '22 13:09 kn-f

I am also affected by the following check.

https://github.com/ItsEcholot/ContainerNursery/blob/97b329c6d2f3886967dec2093a1369556104f69c/src/ProxyHost.ts#L154

In my case the check fails because some headers are missing for the request here:

https://github.com/ItsEcholot/ContainerNursery/blob/97b329c6d2f3886967dec2093a1369556104f69c/src/ProxyHost.ts#L144

Would it be possible to expose a boolean flag to skip this additional verification and just proxy the request, like useHttpProbeCheck set to false? Or maybe another integer called treatHealthyAfterNSeconds or something similar that assumes the container is healthy if after n seconds no errors occurred.

nopper avatar Jan 02 '23 17:01 nopper

@nopper Would this be fixed in your case if we added a config to choose between GET and HEAD requests?

ItsEcholot avatar Jul 21 '23 11:07 ItsEcholot

I think it might, but not sure I'll have the time to check this to be honest. I moved away from containernursery and now I simply use grafana monitoring to identify and kill idle containers. For restarting I have a catch-all route in traefik that triggers a script for restarting the interested container.

nopper avatar Jul 21 '23 12:07 nopper

Thanks for the feedback. In that case I will close here for the time being. Please feel free to reopen / create a new one if needed.

ItsEcholot avatar Jul 21 '23 12:07 ItsEcholot