airflow
airflow copied to clipboard
Add celery broker configuration example using Amazon SQS
Added because there is no clear configuration example using Amazon SQS as Celery broker #22863.
Note: Dictionary cannot be used in airflow.cfg because configparser cannot parse it ?. So you cannot set celery_broker_transport_options like this.
[celery_broker_transport_options]
region = ap-northeast-1
predefined_queues = {
"my_q": {
"url": "your_sqs_host",
"access_key_id": "***",
"secret_access_key": "***",
}
}
Close #22863
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragement file, named {pr_number}.significant.rst, in newsfragments.
Congratulations on your first Pull Request and welcome to the Apache Airflow community! If you have any issues or are unsure about any anything please check our Contribution Guide (https://github.com/apache/airflow/blob/main/CONTRIBUTING.rst) Here are some useful points:
- Pay attention to the quality of your code (flake8, mypy and type annotations). Our pre-commits will help you with that.
- In case of a new feature add useful documentation (in docstrings or in
docs/directory). Adding a new operator? Check this short guide Consider adding an example DAG that shows how users should use it. - Consider using Breeze environment for testing locally, it’s a heavy docker but it ships with a working Airflow and a lot of integrations.
- Be patient and persistent. It might take some time to get a review or get the final approval from Committers.
- Please follow ASF Code of Conduct for all communication including (but not limited to) comments on Pull Requests, Mailing list and Slack.
- Be sure to read the Airflow Coding style. Apache Airflow is a community-driven project and together we are making it better 🚀. In case of doubts contact the developers at: Mailing List: [email protected] Slack: https://s.apache.org/airflow-slack
We had reports that predefined_queues isn't working properly with SQS https://github.com/apache/airflow/issues/11225 (though it's on older Airflow versions)
Can you confirm it's working as it should?
We had reports that
predefined_queuesisn't working properly with SQS #11225 (though it's on older Airflow versions) Can you confirm it's working as it should?
I also checked predefined_queues option and it didn't work in my local environment. This option is not used in my commit.
The PR is likely ready to be merged. No tests are needed as no important environment files, nor python files were modified by it. However, committers might decide that full test matrix is needed and add the 'full tests needed' label. Then you should rebase it to the latest main or amend the last commit of the PR, and push it with --force-with-lease.
Hmm. I thought a bit about it. And following the dicsusion in https://github.com/apache/airflow/pull/25121 - I think we should not add "offficial documenation" to features that our users tested, but we have no automated tests for and do not know ourselves what might be potential risks, problems etc. I think this falls in the same camp.
@BasPH @jedcunningham @feluelle - WDYT?
I think @tsugumi-sys that a better approach would be that you (for example) write a short blog post about using SQS as a broker, and describe your experiences it - and then we can even add it to "Airlfow" publication on Medium and even link it from the Ecosystem page of Airflow. https://airflow.apache.org/ecosystem/
I think adding it to the official docs in experimental phase when we do not have any tests (nor intention to add them) is IMHO something we should not do - even if we add "Experimental" - because "experimental" means that we are planning to "run" some experiment and turn it into a "full feature" (or drop) when we finish the experiment. This is what happens with:
- MsSQL support as meta-datada DB
- ARM64 support for Docker Image.
In both cases we already have tests in CI and we actively plan to act and gather observations/. For example I think we might be inclined to drop MsSQL if the Survey results will not show significant uptake in MsSQL usage after it's been officially "experimentally" supported in 2.3 https://airflow.apache.org/blog/airflow-survey-2022/ shows a very little usage so far.
Adding information that SQS "works" in the official docs is some kind of statement we make to our users and they might have certain expectations if they follow it.
I think I'd be only inclined to add "experimental" SQS support if we have some tests in our CI making sure that it works (BTW. We have such - simple but regularly running tests for RabbitMQ and Redis).
Others - WDYT?
I think it's perfectly reasonable to not want it in the docs if we can't back up the claim with testing.
I feel like the situation here is slightly different as it is officially supported by Celery, and presumably properly tested there. I'm not sure we want to exhaustively test our dependencies. That said, this is a pretty critical piece of the puzzle 🤷
I feel like the situation here is slightly different as it is officially supported by Celery, and presumably properly tested there. I'm not sure we want to exhaustively test our dependencies. That said, this is a pretty critical piece of the puzzle shrug
Actually that might change my view. I did not realize SQS is on-par with RabbitMQ/Redis for Celery. On the other Hand MAYBE this is a GREAT opportunity to add a test or two using Moto ?
https://qxf2.com/blog/mocking-out-aws-sqs-using-moto/
Then we would be completely covered and it shoudl be no problem to make it official. @ferruzzi @o-nikolas @vincbeck ? WDYT>
FYI. @jedcunningham - those are the kind of problems we might expect https://github.com/apache/airflow/discussions/25486 . I feel rather uncomfortable not having any tests around it. I am not sure even how we would approach diagnosing and fixing similar issues if we do not have any tests like that. There might be many reasons for this kind of failures (celery/sqs incompatibilities, our way of integrating with celery etc.
We had reports that
predefined_queuesisn't working properly with SQS #11225 (though it's on older Airflow versions) Can you confirm it's working as it should?I also checked
predefined_queuesoption and it didn't work in my local environment. This option is not used in my commit.
Predefined queues, or any other broker transport options can be configured by a plugin to get around the string vs object discrepancy.
I feel like the situation here is slightly different as it is officially supported by Celery, and presumably properly tested there. I'm not sure we want to exhaustively test our dependencies. That said, this is a pretty critical piece of the puzzle shrug
Actually that might change my view. I did not realize SQS is on-par with RabbitMQ/Redis for Celery. On the other Hand MAYBE this is a GREAT opportunity to add a test or two using Moto ?
https://qxf2.com/blog/mocking-out-aws-sqs-using-moto/
Then we would be completely covered and it shoudl be no problem to make it official. @ferruzzi @o-nikolas @vincbeck ? WDYT>
I have a love/hate relationship with using SQS as a celery broker :) (maybe this is for a blog post...)
As a tl;dr: it works, but has a lot of gotchas, and I wouldn't necessarily recommend it generally to airflow users.
SQS is serverless, scales great, and is relatively fast. But it doesn't actually have the same level of support as RabbitMQ and Redis have as a Celery broker. Namely the Monitor and Remote Control APIs are not supported, so while it is stable, it has far less support. Also SQS has the visibility timeout architecturally built into it, which means you can't have tasks that run longer than that timeout without that causing issues (max value for that timeout is 12 hours).
I still support adding automated testing for SQS though, I think that's a worthwhile task.
That comment from @o-nikolas make me think we are NOT ready to make it as an official part of our documentation. But as a blog post was mentioned several times - if either of you would like to write it, I think we can publish it in https://medium.com/apache-airflow
I think we can close it now. We can always reopen if we think it is worthwile, but I'd bee rather sceptical on supporting SQS knowing the caveats.
FYI: I think that was a good idea to not "pretend" we support SQS when we have no tests for it. An example of interesting problem with it is here for example: https://github.com/apache/airflow/discussions/26521?converting=1#discussioncomment-3690260