airflow icon indicating copy to clipboard operation
airflow copied to clipboard

Add celery broker configuration example using Amazon SQS

Open tsugumi-sys opened this issue 3 years ago • 11 comments

Added because there is no clear configuration example using Amazon SQS as Celery broker #22863.

Note: Dictionary cannot be used in airflow.cfg because configparser cannot parse it ?. So you cannot set celery_broker_transport_options like this.

[celery_broker_transport_options]
region = ap-northeast-1
predefined_queues = {
    "my_q": {
        "url": "your_sqs_host",
        "access_key_id": "***",
        "secret_access_key": "***",
    }
}

Close #22863


^ Add meaningful description above

Read the Pull Request Guidelines for more information. In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed. In case of a new dependency, check compliance with the ASF 3rd Party License Policy. In case of backwards incompatible changes please leave a note in a newsfragement file, named {pr_number}.significant.rst, in newsfragments.

tsugumi-sys avatar May 30 '22 09:05 tsugumi-sys

Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst) Here are some useful points:

  • Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
  • In case of a new feature add useful documentation (in docstrings or in docs/ directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it.
  • Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
  • Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
  • Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
  • Be sure to read the Airflow Coding style. Apache Airflow is a community-driven project and together we are making it better 🚀. In case of doubts contact the developers at: Mailing List: [email protected] Slack: https://s.apache.org/airflow-slack

boring-cyborg[bot] avatar May 30 '22 09:05 boring-cyborg[bot]

We had reports that predefined_queues isn't working properly with SQS https://github.com/apache/airflow/issues/11225 (though it's on older Airflow versions) Can you confirm it's working as it should?

eladkal avatar May 30 '22 10:05 eladkal

We had reports that predefined_queues isn't working properly with SQS #11225 (though it's on older Airflow versions) Can you confirm it's working as it should?

I also checked predefined_queues option and it didn't work in my local environment. This option is not used in my commit.

tsugumi-sys avatar May 30 '22 11:05 tsugumi-sys

The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest main or amend the last commit of the PR, and push it with --force-with-lease.

github-actions[bot] avatar May 31 '22 02:05 github-actions[bot]

Hmm. I thought a bit about it. And following the dicsusion in https://github.com/apache/airflow/pull/25121 - I think we should not add "offficial documenation" to features that our users tested, but we have no automated tests for and do not know ourselves what might be potential risks, problems etc. I think this falls in the same camp.

@BasPH @jedcunningham @feluelle - WDYT?

I think @tsugumi-sys that a better approach would be that you (for example) write a short blog post about using SQS as a broker, and describe your experiences it - and then we can even add it to "Airlfow" publication on Medium and even link it from the Ecosystem page of Airflow. https://airflow.apache.org/ecosystem/

I think adding it to the official docs in experimental phase when we do not have any tests (nor intention to add them) is IMHO something we should not do - even if we add "Experimental" - because "experimental" means that we are planning to "run" some experiment and turn it into a "full feature" (or drop) when we finish the experiment. This is what happens with:

  • MsSQL support as meta-datada DB
  • ARM64 support for Docker Image.

In both cases we already have tests in CI and we actively plan to act and gather observations/. For example I think we might be inclined to drop MsSQL if the Survey results will not show significant uptake in MsSQL usage after it's been officially "experimentally" supported in 2.3 https://airflow.apache.org/blog/airflow-survey-2022/ shows a very little usage so far.

Adding information that SQS "works" in the official docs is some kind of statement we make to our users and they might have certain expectations if they follow it.

I think I'd be only inclined to add "experimental" SQS support if we have some tests in our CI making sure that it works (BTW. We have such - simple but regularly running tests for RabbitMQ and Redis).

Others - WDYT?

potiuk avatar Jul 25 '22 14:07 potiuk

I think it's perfectly reasonable to not want it in the docs if we can't back up the claim with testing.

ferruzzi avatar Jul 25 '22 19:07 ferruzzi

I feel like the situation here is slightly different as it is officially supported by Celery, and presumably properly tested there. I'm not sure we want to exhaustively test our dependencies. That said, this is a pretty critical piece of the puzzle 🤷

jedcunningham avatar Jul 26 '22 18:07 jedcunningham

I feel like the situation here is slightly different as it is officially supported by Celery, and presumably properly tested there. I'm not sure we want to exhaustively test our dependencies. That said, this is a pretty critical piece of the puzzle shrug

Actually that might change my view. I did not realize SQS is on-par with RabbitMQ/Redis for Celery. On the other Hand MAYBE this is a GREAT opportunity to add a test or two using Moto ?

https://qxf2.com/blog/mocking-out-aws-sqs-using-moto/

Then we would be completely covered and it shoudl be no problem to make it official. @ferruzzi @o-nikolas @vincbeck ? WDYT>

potiuk avatar Jul 28 '22 17:07 potiuk

FYI. @jedcunningham - those are the kind of problems we might expect https://github.com/apache/airflow/discussions/25486 . I feel rather uncomfortable not having any tests around it. I am not sure even how we would approach diagnosing and fixing similar issues if we do not have any tests like that. There might be many reasons for this kind of failures (celery/sqs incompatibilities, our way of integrating with celery etc.

potiuk avatar Aug 02 '22 19:08 potiuk

We had reports that predefined_queues isn't working properly with SQS #11225 (though it's on older Airflow versions) Can you confirm it's working as it should?

I also checked predefined_queues option and it didn't work in my local environment. This option is not used in my commit.

Predefined queues, or any other broker transport options can be configured by a plugin to get around the string vs object discrepancy.

o-nikolas avatar Aug 10 '22 20:08 o-nikolas

I feel like the situation here is slightly different as it is officially supported by Celery, and presumably properly tested there. I'm not sure we want to exhaustively test our dependencies. That said, this is a pretty critical piece of the puzzle shrug

Actually that might change my view. I did not realize SQS is on-par with RabbitMQ/Redis for Celery. On the other Hand MAYBE this is a GREAT opportunity to add a test or two using Moto ?

https://qxf2.com/blog/mocking-out-aws-sqs-using-moto/

Then we would be completely covered and it shoudl be no problem to make it official. @ferruzzi @o-nikolas @vincbeck ? WDYT>

I have a love/hate relationship with using SQS as a celery broker :) (maybe this is for a blog post...)

As a tl;dr: it works, but has a lot of gotchas, and I wouldn't necessarily recommend it generally to airflow users.

SQS is serverless, scales great, and is relatively fast. But it doesn't actually have the same level of support as RabbitMQ and Redis have as a Celery broker. Namely the Monitor and Remote Control APIs are not supported, so while it is stable, it has far less support. Also SQS has the visibility timeout architecturally built into it, which means you can't have tasks that run longer than that timeout without that causing issues (max value for that timeout is 12 hours).

I still support adding automated testing for SQS though, I think that's a worthwhile task.

o-nikolas avatar Aug 10 '22 22:08 o-nikolas

That comment from @o-nikolas make me think we are NOT ready to make it as an official part of our documentation. But as a blog post was mentioned several times - if either of you would like to write it, I think we can publish it in https://medium.com/apache-airflow

potiuk avatar Aug 21 '22 22:08 potiuk

I think we can close it now. We can always reopen if we think it is worthwile, but I'd bee rather sceptical on supporting SQS knowing the caveats.

potiuk avatar Aug 21 '22 22:08 potiuk

FYI: I think that was a good idea to not "pretend" we support SQS when we have no tests for it. An example of interesting problem with it is here for example: https://github.com/apache/airflow/discussions/26521?converting=1#discussioncomment-3690260

potiuk avatar Sep 20 '22 13:09 potiuk