ContainerNursery
ContainerNursery copied to clipboard
Not proxying properly once the container is active [ARM/*arr]
Hello,
When a *arr (Prowlarr, Sonarr....) container is "revived" for the second time, Container Nursery does not redirect to the application even if it's up an running properly. The debug message shows that the application returns a 401 - Unauthorized to Container Nursery and as such the proxy does not recognize the application as "up and running".
In my opinion (open to discussion) as long as Conainer Nursery receives a "valid" HTTP response from the web server (from 100 to 499?) the proxy should recognize the application as "up and running" and thus pass the control to the application itself.
I did a small test, changing this line in the ProxyHosts file:
if (res.status === 200 || (res.status >= 300 && res.status <= 399)) {
to
if (res.status === 200 || (res.status >= 300 && res.status <= 499)) {
and it works like a charm.
An option would be to make this range configurable by the user either at ContainerNursery level or at individual container level in case widening the range is not in line with the design of the application.
This is a tough one, I was never 100% happy with the current implementation of this. The aim was to make sure the redirect only happens after the application is truly ready (so the user doesn't get a Gateway Timeout or something similar and has to reload by hand after a few seconds), which in most cases should be 200
or a redirect 300-399
.
I'm open to adding more status codes to the list, so we should maybe gather a list of ranges/codes which should (if the application is designed correctly) correspond with a running & ready application.
After looking through the HTTP status codes list I came up with the following:
- 200
- 201
- 300-399
- 401
Other suggestions?
Based on the HTTP status codes, I'd advise to:
- 1xx: Accept, these are informational responses implying that the container is up and running
- 2xx: Accept, these are successful responses, as such we're pretty sure the container is up and running
- 3xx: Accept, these are redirections, possibly ContainerNursery is contacting the "wrong" endpoint but the container should be up and running
- 4xx: Accept, these are client errors. Maybe Container Nuersery is contacting the wrong endpoint or the session that the client is supposed to be open has been forcibly closed by shutting down the container
- 5xx: Reject, these are server side errors implying that either the container is not fully active or that there's an error in the container
Said that, maybe the project could keep the original list of valid status codes with the possibility to specify an endpoint in the config file to check if the container is properly up and running.
i.e. Prowlarr has an endpoint to check the server health: /api/v1/health
I'm hesitant to add the 4xx
range since, you're right, this should be a client error. But some on launch building onepage applications might return 404
while the index.html
isn't built yet (as an example dashy
comes to mind), since the nginx/apache webserver is already running.
Also 2xx codes like 204
(No Content) should probably not lead to success?
So I think the following: 100-199
, 200
, 201
, 300-399
, 401
Regarding the health endpoint, maybe we should be thinking about adding docker healthcheck support. This way we don't need to build our own solution and increase compatibility as well.
I'm hesitant to add the
4xx
range since, you're right, this should be a client error. But some on launch building onepage applications might return404
while theindex.html
isn't built yet (as an exampledashy
comes to mind), since the nginx/apache webserver is already running. Also 2xx codes like204
(No Content) should probably not lead to success?So I think the following:
100-199
,200
,201
,300-399
,401
I wasn't considering this so definitively extending to all the 4XX codes is a "no go". With respect to the next comment, not all docker containers implement health checks as well as not all applications implement API to check the health of the service. Whatever of the two solution is chosen there will be a drawback. Personally I'd vote for using the HEALTHCHECK functionality in docker.
As a quick fix I'd currently keep the codes you mentioned before and continue the discussion/feature request to see which is the best long term approach.
I am also affected by the following check.
https://github.com/ItsEcholot/ContainerNursery/blob/97b329c6d2f3886967dec2093a1369556104f69c/src/ProxyHost.ts#L154
In my case the check fails because some headers are missing for the request here:
https://github.com/ItsEcholot/ContainerNursery/blob/97b329c6d2f3886967dec2093a1369556104f69c/src/ProxyHost.ts#L144
Would it be possible to expose a boolean flag to skip this additional verification and just proxy the request, like useHttpProbeCheck
set to false
? Or maybe another integer called treatHealthyAfterNSeconds
or something similar that assumes the container is healthy if after n
seconds no errors occurred.
@nopper Would this be fixed in your case if we added a config to choose between GET
and HEAD
requests?
I think it might, but not sure I'll have the time to check this to be honest. I moved away from containernursery and now I simply use grafana monitoring to identify and kill idle containers. For restarting I have a catch-all route in traefik that triggers a script for restarting the interested container.
Thanks for the feedback. In that case I will close here for the time being. Please feel free to reopen / create a new one if needed.