charts icon indicating copy to clipboard operation
charts copied to clipboard

Unstable deployment on local Kubernetes on MacOS M1 chip

Open rsotogar opened this issue 2 years ago • 11 comments

Hi!

When I deploy the Airflow chart (latest version, 8.5.2) locally on my M1 Mac, some pods restart constantly (specially the scheduler and the webserver, which are only up for a couple of minutes). No matter how many replica pods I create to ensure availability, I can't get a stable deployment. Could this be related to poor image quality on Mac M1? I used to ssh into Linux instances and deploy Airflow using Docker compose (w/o Kubernetes) and never had an issue with it. The problems started when I attempted a local deployment using Docker/Kubernetes on Mac M1.

  • K8S version: 1.21.3
  • Airflow Helm chart: 8.5.2

rsotogar avatar Dec 31 '21 12:12 rsotogar

I am facing the same issue with my Mac, I am on the Intel chip.

haripkrish avatar Dec 31 '21 18:12 haripkrish

I am facing the same issue with my Mac, I am on the Intel chip.

Any idea why this happens? I thought the entire idea behind containers is that they are agnostic to OS...

rsotogar avatar Dec 31 '21 21:12 rsotogar

@rsotogar @hariprasadkrishnamurthyDH I think you may misunderstand how docker containers work, they are only "portable" between machines with the same CPU architecture.

Note, the M1 MabBooks use an arm64 CPU architecture, but the airflow container images are compiled for amd64, so to run on arm64 they use emulation, which can be very slow and buggy!


Right now there when you run the chart on an M1 MacBook it won't work unless you disable pgbouncer.enabled because the ghcr.io/airflow-helm/pgbouncer image seems to not run correctly under emulation (see issue https://github.com/airflow-helm/charts/issues/494 to track this).

Note, even with pgbouncer.enabled disabled, the apache/airflow images will run incredibly slowly as they will run under Docker's amd64 emulation that uses quemu. See the upstream airflow issue https://github.com/apache/airflow/issues/15635 to track the progress of cross-compiling apache/airflow for arm64.

thesuperzapper avatar Jan 10 '22 02:01 thesuperzapper

Yep - see the comment in https://github.com/apache/airflow/issues/15635#issuecomment-1020003975, In Airflow we are getting closer to supporting M1 images (I even bought a new MacBook with this effort in mind :D). I think the last "serious" obstacle with Beam/Numpy will be removed beginning og February. I have seriously revamped and improved (and simplified!) our docker building for Airflow to use BuildKit to make it easier to build mutli-platform images so this effort is well in progres.

In the meantime - I do sympathise with you @rsotogar. MacOS with M1 is not yet REALLY ready for Dockerized experience. The super-powerful Mac Pro M1 Max I have, is barely usable with Docker. MacOS docker experience has already been crippled by super-slow filesystem sharing but when you add emulation on top it slows down to a crawl.

Just to explain where it comes from.

The problem is mainly because many of the dependencies are scrambling to get proper "architecture" support. This is a good thing - long term - that Apple switched to ARM (more choice is a good thing) and it pretty much forced all the developers to pay attention.

The ARM support has to - unfortunately - bubble up - from OS support for ARM (already there for linux for quite some time thanks to Android), then low-level libraries in C/C++/Rust, then Python low-level libraries (like Numpy) finally applications and tools like Airflow (or for example MySQL or MSSQL that have to provide clients and images that are also ARM-based). All this is a huge effort by all parties involved - not as much to make the change but mainly to make sure that it works and that they have the right continuous integration in-place to support it, so that they can release the software with confidence after passing all the tests.

I think ~ mid 2022 will be the time where all this effort will be nearly complete and those who won't be there will have to do it as they will lag behind (this is the main reason I deffered buying M1 MacBook as I knew it was very far from being ready when M1 was released first and I hated the touchbar, and lack of HDMI. MagSafe is great BTW and I am glad it is back).

So :crossed_fingers: that all our dependencies will be ready soon (or at least the crucial ones that will allow Airlfow to reliably and reproducibly build ARM image and add CI harness for it).

potiuk avatar Jan 24 '22 11:01 potiuk

@thesuperzapper Should it work with pgbouncer.enabled being false? It doesn't work for me even though that option is false by default. I'm using official airflow helm chart of version 1.4.0.

sgc109 avatar Feb 07 '22 16:02 sgc109

@sgc109 this repo/issue is for the "user-community" airflow helm chart, so I can't speak for the official chart.

But as far as I know, the apache/airflow docker images arent compiled for ARM yet, and both charts use those images.

thesuperzapper avatar Feb 08 '22 03:02 thesuperzapper

@thesuperzapper Oh, I forgot we're in user-community helm chart issue because I'm here with a link from official one. Thank you for reply!

sgc109 avatar Feb 08 '22 03:02 sgc109

Yeah. Last blocker (I believe) beam 2.36.0 has been released today for the ARM image, so :crossed_fingers: for the ARM image soon'ish

potiuk avatar Feb 08 '22 09:02 potiuk

The ghcr.io/airflow-helm/pgbouncer docker image is now being cross-compiled for both linux/amd64 and linux/arm64 architectures as of tag 1.17.0-patch.0 (See PR https://github.com/airflow-helm/charts/pull/551)

As of chart version 8.6.0 I have made the default PgBouncer image ghcr.io/airflow-helm/pgbouncer:1.17.0-patch.0, which means the only non ARM-native image we are using is apache/airflow:2.2.5-python3.8.

The status of getting a native linux/arm64 image for apache/airflow is being discussed on the airflow issue tracker here https://github.com/apache/airflow/issues/15635.

thesuperzapper avatar Apr 19 '22 23:04 thesuperzapper

Correct. The image (not identical - it misses mysql and mssql) for ARM is likely to be published in experimental form for 2.3.0 - pending discussion here: https://lists.apache.org/thread/pqhks390dkso9x668gbnvjq6k6wv8h9h

potiuk avatar Apr 20 '22 13:04 potiuk

The Airflow 2.3.0 has been shipped with ARM support. Note that it is experimental and has only support for Postgres for now as mysql and mssql do not have yet support for ARM.

It is not fully tested in Airflow CI, hence experimental support status - but over time we will likely make it first-class citizen and start running tests for the ARM image too (though I do not expect there will be many problems coming from different CPU architecture that are coming from Airlfow - they might come from the underlying images, but Airflow itself has no binary/compiled code so as long as Python interpreter works fine for ARM, Airlfow should work too.

ARM CI development images were in active use for several months by a number of contributors and there was not a single problem I am aware of resulting from the OS architecture.

More details and history : https://github.com/apache/airflow/issues/15635

potiuk avatar May 01 '22 14:05 potiuk