flower icon indicating copy to clipboard operation
flower copied to clipboard

Investigate no comms in docker

Open Tomasz-Kluczkowski opened this issue 3 years ago • 5 comments

Add guard in Flower's boot sequence to wait for broker. This prevents situation where Flower boots in incorrect state and is never able to communicate with the broker even if it is available later on.

I do appreciate the fact that I did not solve the real issue of Flower not able to call the broker's url despite that fact that it is callable but:

  • this issue is only a problem if we boot without a broker. If broker disappears after Flower successfully booted with a broker present and then comes back - Flower operates as usual and can access the broker again.
  • if we protect Flower from booting without the broker, the issue is then resolved.

A smarter solution (for which I am afraid I do not have time) would be to understand why Flower/Tornado cannot call a simple url using async http client after Flower booted without the broker. I tried but lost my patience :). Tornado is not something I deal with in work life but I am opened to enlightenment :)

I also changed the Dockerfile and removed ENTRYPOINT since we no longer have flower as a standalone command and also we need to be able fully override the command that Flower's image starts with to be able to specify all the broker's and so on.

Small fixes to some docs where I noticed the old way of doing things was still used.

@mher I have tested this code locally and by replacing the image that dockerfile uses with an image based on this branch in my fork of Flower and running docker-compose up with various options - I used rabbitmq, redis and checked booting Flower with them running and without - the new code makes Flower wait until broker is running or exits if max_retries is exceeded.

Image showing the new behaviour: Screenshot from 2021-06-07 21-21-32

Related to issue: #1112

Tomasz-Kluczkowski avatar Jun 07 '21 19:06 Tomasz-Kluczkowski

@mher I have made a tiny change inspired by other usage of celery connections - basically when checking presence of the broker at startup of Flower we can reuse connection from the pool of connections on the celery_app object instead of creating one. Not a big issue since this is done only once but a good practice I think.

This whole set of changes is now active in our production and staging kubernetes clusters and working fine.

Please let me know if you are happy with this PR :).

I was asked for more Prometheus metrics plus I see a fix needed for the queuing time one so I will open another PR for it.

I also wanted to ask here if you would be OK with adding some project wide standards in another PR:

  • linting
  • code formatting (black). Here I have a question if you would be ok with making the line length 120 chars as we are in 21st century :P and 79-80 seems so last century setting :).
  • git pre-commit hooks to enforce the above
  • step in CI pipeline to enforce the above

Finally if you are aware of anything that needs fixing please let me know and I will try to help.

Tomasz-Kluczkowski avatar Jun 19 '21 11:06 Tomasz-Kluczkowski

@Tomasz-Kluczkowski I'm not @mher but I guess nobody is against what you mentioned these days and I would even go further and add:

  • static checking (mypy?)
  • typing annotations

If you're considering black I would go and do it from day 0 with a length of 120 to avoid later noise

nicolaerosia avatar Jun 28 '21 17:06 nicolaerosia

@Tomasz-Kluczkowski can you move docker updates into a separate pull request?

mher avatar Jul 05 '21 17:07 mher

Of course, good idea!

Tomasz-Kluczkowski avatar Jul 05 '21 18:07 Tomasz-Kluczkowski

@mher Done in #1121

What are your thoughts on the bug I was trying to fix on this PR? I am using this fix in our prod and it is fine although I would prefer a more elegant solution so maybe you have ideas?

Tomasz-Kluczkowski avatar Jul 05 '21 18:07 Tomasz-Kluczkowski