delta-rs icon indicating copy to clipboard operation
delta-rs copied to clipboard

Azure AD Auth fails on ARM64

Open george-zubrienko opened this issue 1 year ago • 16 comments

Environment

Amazon Linux 2023, ARM64 arch host, container based on python3.11-slim-bookworm

Delta-rs version:

0.17.2

Binding:

python

Environment:

  • Cloud provider: AWS
  • OS: AL2023
  • Other: Python3.11

Bug

What happened:

When trying to load the table using AZURE_CLIENT_ID etc credentials, getting this:

OSError: Generic MicrosoftAzure error: Error performing token request: Error after 10 retries in 2.050817107s, max_retries:10, retry_timeout:180s, source:error sending request for url (https://login.microsoftonline.com/.../oauth2/v2.0/token): error trying to connect: failed to get random bytes

Stack trace:

image

What you expected to happen:

Table reads as before on 0.8.1 version on the same host/container

How to reproduce it:

Run table read on ARM64/AL2023 vm with Azure auth against az://... table path

More details: N/A

george-zubrienko avatar May 02 '24 15:05 george-zubrienko

@george-zubrienko are you able to test the connection from non-arm environments?

ion-elgreco avatar May 02 '24 15:05 ion-elgreco

@ion-elgreco yes if I change machine type to amd64 it works fine. Fun part, people on Mac M2's do not have this issue

george-zubrienko avatar May 02 '24 15:05 george-zubrienko

I'm building a new image rn with 0.17.3, fyi this was the wheel used by ARM64 build: deltalake-0.17.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

Same result with 0.17.3. I assume the error is coming from Rust code, not python as it is reported as OSError

george-zubrienko avatar May 02 '24 15:05 george-zubrienko

I'm building a new image rn with 0.17.3, fyi this was the wheel used by ARM64 build: deltalake-0.17.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl

Same result with 0.17.3. I assume the error is coming from Rust code, not python as it is reported as OSError

Could you do a bisect, to see which release this problem started occurring for you?

ion-elgreco avatar May 02 '24 15:05 ion-elgreco

I'm building a new image rn with 0.17.3, fyi this was the wheel used by ARM64 build: deltalake-0.17.2-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl Same result with 0.17.3. I assume the error is coming from Rust code, not python as it is reported as OSError

Could you do a bisect, to see which release this problem started occurring for you?

We just upgraded from 0.8.1 to 0.17.* and I got the error. I can try to narrow it down a little. In case this adds anything, we are loading data like this: https://github.com/SneaksAndData/adapta/blob/main/adapta/storage/delta_lake/_functions.py#L39-L90

george-zubrienko avatar May 02 '24 15:05 george-zubrienko

@ion-elgreco it works up to 0.16.1, 0.16.2 breaks it

george-zubrienko avatar May 02 '24 16:05 george-zubrienko

@george-zubrienko hmm strange, nothing implies there is a change that could have caused this in that release.

Maybe its the rust version which it gets compiled with, are you able to compile with some older version and check that

ion-elgreco avatar May 02 '24 16:05 ion-elgreco

@george-zubrienko hmm strange, nothing implies there is a change that could have caused this in that release.

Maybe its the rust version which it gets compiled with, are you able to compile with some older version and check that

Arent 0.16.2 and 0.16.1 compiled with the same version? I can try to force this package to compile from source rather than use the wheel

george-zubrienko avatar May 02 '24 16:05 george-zubrienko

@ion-elgreco

So, using this compiler version:

# rustc --version
rustc 1.78.0 (9b00956e5 2024-04-29)

Running this:

# pip install --upgrade deltalake==0.17.2 --no-binary :all:
Collecting deltalake==0.17.2
  Using cached deltalake-0.17.2.tar.gz (4.8 MB)
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Requirement already satisfied: pyarrow>=8 in /usr/local/lib/python3.11/site-packages (from deltalake==0.17.2) (16.0.0)
Requirement already satisfied: pyarrow-hotfix in /usr/local/lib/python3.11/site-packages (from deltalake==0.17.2) (0.6)
Requirement already satisfied: numpy>=1.16.6 in /usr/local/lib/python3.11/site-packages (from pyarrow>=8->deltalake==0.17.2) (1.26.4)
Building wheels for collected packages: deltalake
  Building wheel for deltalake (pyproject.toml) ... |yrtydone
  Created wheel for deltalake: filename=deltalake-0.17.2-cp38-abi3-linux_aarch64.whl size=27450351 sha256=1c47967d8c6cd74a7414607321671e8c5fb093b6767035bdffc5562c95014f33
  Stored in directory: /root/.cache/pip/wheels/cf/6f/ad/d4179379730a3e8649079d3050ccba5bdf1ccad191074b0027
Successfully built deltalake
Installing collected packages: deltalake
  Attempting uninstall: deltalake
    Found existing installation: deltalake 0.17.3
    Uninstalling deltalake-0.17.3:
      Successfully uninstalled deltalake-0.17.3
Successfully installed deltalake-0.17.2

Everything works. So it is a problem with the wheel?

Kernel: Linux 6.1.84-99.169.amzn2023.aarch64 #1 SMP Mon Apr 8 19:19:24 UTC 2024 aarch64 GNU/Linux
OS: Debian 12 (bookworm)

george-zubrienko avatar May 02 '24 16:05 george-zubrienko

@george-zubrienko ok that's helpful!

I think it's because of the rust compiler.

  • V0.16.1 was compiled with rust 1.76.0
  • v0.16.2 until v0.17.3 is compiled with 1.77.x

You seem to get it working when it's compiled with 1.78.0

ion-elgreco avatar May 02 '24 16:05 ion-elgreco

That is the case! Any chance we can get compiler bumped for 0.17.4? 🙏

george-zubrienko avatar May 02 '24 17:05 george-zubrienko

That is the case! Any chance we can get compiler bumped for 0.17.4? 🙏

Probably next release. Rust 1.78 got released today and our release just missed that version

ion-elgreco avatar May 02 '24 17:05 ion-elgreco

I'm also facing same issue with aws sts, using latest deltalake version

gaurav7261 avatar May 07 '24 12:05 gaurav7261

@ion-elgreco tag me here once you release the latest version, right now we are reverting back to amd

gaurav7261 avatar May 07 '24 12:05 gaurav7261

I would be surprised if this was the compiler. It is more likely that a dependency for which we have a loose version range specifier. Either way, I figure the next release should take care of this for ya :smile:

rtyler avatar May 07 '24 13:05 rtyler

@rtyler we are also facing same with aws as well

gaurav7261 avatar May 09 '24 21:05 gaurav7261

@george-zubrienko can you check against v0.18.0 please?

ion-elgreco avatar Jun 07 '24 11:06 ion-elgreco

@george-zubrienko can you check against v0.18.0 please?

Will check on Monday!

george-zubrienko avatar Jun 07 '24 21:06 george-zubrienko

Bit delayed - deploying this as of now, in case I don't get to actually run the validation, will ping tomorrow

george-zubrienko avatar Jun 10 '24 20:06 george-zubrienko

Side note: it seems macOS environments now have issues importing datalake:

image

george-zubrienko avatar Jun 11 '24 16:06 george-zubrienko

Side note: it seems macOS environments now have issues importing datalake:

image

This is already reported, see the issues board. Someone also has a workaround to get it still inetalled

ion-elgreco avatar Jun 11 '24 16:06 ion-elgreco

@ion-elgreco I just tried with deltalake 0.18.0 on ARM machine and I'm getting the same error (library installed from wheel, I can try to build from source tomorrow if you want to confirm it still works in that case)

george-zubrienko avatar Jun 11 '24 19:06 george-zubrienko

@george-zubrienko you might give 0.18.1 a shot which includes latest object store version, not sure if it's going to have any impact though

ion-elgreco avatar Jun 12 '24 21:06 ion-elgreco

@george-zubrienko you might give 0.18.1 a shot which includes latest object store version, not sure if it's going to have any impact though

Will try, but... isn't this strange a bit? I've checked the Mac issue as well and I see people also resolve it with no-binary flag to pip. So two issues at this point that are resolved by simply recompiling the rust library - which for me at least tells that this is not related to code. Maybe something changed in your release process?

george-zubrienko avatar Jun 14 '24 06:06 george-zubrienko

@george-zubrienko you might give 0.18.1 a shot which includes latest object store version, not sure if it's going to have any impact though

Will try, but... isn't this strange a bit? I've checked the Mac issue as well and I see people also resolve it with no-binary flag to pip. So two issues at this point that are resolved by simply recompiling the rust library - which for me at least tells that this is not related to code. Maybe something changed in your release process?

Yeah the Mac issue got fixed by just bumping the os version of the runners

ion-elgreco avatar Jun 14 '24 06:06 ion-elgreco

@ion-elgreco we just tested with 0.18.1 for both Azure and AWS, the error is gone

george-zubrienko avatar Jul 02 '24 14:07 george-zubrienko

I believe this can be closed as we rolled 0.18.1 on prod with mostly ARM machines and I do not see any failures :) Thanks a lot!

george-zubrienko avatar Jul 09 '24 17:07 george-zubrienko

@george-zubrienko thanks for the update!

ion-elgreco avatar Jul 09 '24 21:07 ion-elgreco