arrow icon indicating copy to clipboard operation
arrow copied to clipboard

GH-43097: [C++] Implement `PathFromUri` support for Azure file system

Open OliLay opened this issue 1 year ago • 1 comments

Rationale for this change

See #43097.

What changes are included in this PR?

Implements AzureFS::PathFromUri using existing URI parsing and path extraction inside the AzureOptions.

Are these changes tested?

Yes, added a unit test.

Are there any user-facing changes?

No, but calling PathFromUri will now work instead of throwing due to no implementation provided.

  • GitHub Issue: #43097

OliLay avatar Jul 01 '24 15:07 OliLay

:warning: GitHub issue #43097 has been automatically assigned in GitHub to PR creator.

github-actions[bot] avatar Jul 01 '24 15:07 github-actions[bot]

It seems that other filesystem implementations use internal::PathFromUriHelper(). Can we use it in Azure filesystem too?

I tried that at first, but the issue is that for Azure we have to support different URI schemes where the authority is handled differently. An example: abfss://storageacc.blob.core.windows.net/container/some/path vs. abfss://acc:pw@container/some/path where both are accepted URIs by arrow and should yield the same path container/some/path.

So just using internal::PathFromUriHelper() (either with prepending the authority or with not prepending it) would break parsing one of these URIs. So we'll have to check which kind of URI we have and decide based on that - I saw this is already implemented in AzureOptions::FromUri (and this also returns the path), so I just used this.

OliLay avatar Jul 19 '24 06:07 OliLay

@kou Can you give it another look? I think it should be good to go.

OliLay avatar Aug 13 '24 08:08 OliLay

After merging your PR, Conbench analyzed the 4 benchmarking runs that have been run so far on merge-commit 88e8140ad7902435b5d1ac29205dda7517f2cc79.

There were no benchmark performance regressions. 🎉

The full Conbench report has more details. It also includes information about 2 possible false positives for unstable benchmarks that are known to sometimes produce them.