arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-39968: [Python][FS][Azure] Minimal Python bindings for `AzureFileSystem`

Open Tom-Newton opened this issue 6 months ago • 35 comments

Rationale for this change

We want to use the new AzureFileSystem in pyarrow.

What changes are included in this PR?

  • Add minimal python bindings for AzureFileSystem. This includes just enough to run the python tests against azurite plus default credential auth to enable real use of this once this PR merges.
  • Adding additional configuration options and remaining authentication options can be done as a follow up.
  • I tried to copy the existing pybinds for GCS and S3
  • Explicitly set ARROW_AZURE=OFF rather than relying on defaults. The defaults are different for builds vs tests so this was causing tests to be enabled while Azure was disabled during the build.

Are these changes tested?

Enabled the the python filesystem tests for the new filesystem. I had to skip azure in a couple of the tests though because they are not yet working on the C++ side. I created Github issues to resolve these https://github.com/apache/arrow/issues/40025 and https://github.com/apache/arrow/issues/40026 and added TODO comments where relevant, that reference these Github issues.

Are there any user-facing changes?

pyarrow users can now use the native AzureFileSystem to get much better reliability and performance compared to adlfs based options.

  • Closes: #39968
  • GitHub Issue: #39968

Tom-Newton avatar Feb 09 '24 21:02 Tom-Newton