s3proxy icon indicating copy to clipboard operation
s3proxy copied to clipboard

s3proxy support for azure datalake storage gen2

Open mubaraque-ali opened this issue 9 months ago • 1 comments

Hi All, we using apache spark to use s3proxy to read & write data using s3a:// api from azure datalake storage gen2. As I was reading through the s3proxy documentation, we can see that s3proxy supports azure blob storage for apache spark. However azure has one more flavor of storage - optimised for bigdata - i.e. azure datalake storage gen2 and there is no info about it on the s3proxy documentation. Can any one please help me with some details how to configure the s3proxy to use azure datalake storage gen2.

s3proxy.properties file:

s3proxy.endpoint=http://0.0.0.0:8080 s3proxy.authorization=aws-v2-or-v4 s3proxy.identity=local-identity s3proxy.credential=local-credential jclouds.provider=azureblob-sdk jclouds.azureblob.auth=azureKey jclouds.endpoint=https://testsa.blob.core.windows.net jclouds.identity=testsa jclouds.credential=

  1. data.csv is available in adls gen2 container & spark is able to read the data. val df = spark.read.csv("s3a://data/s3proxy/data.csv") df.show()

  2. But while doing write operation to adls gen2 using below commands. Getting error df.write.format("csv").option("header","true").save("s3a://data/s3proxy/data_new.csv")

error:

[s3proxy] W 03-10 11:42:48.599 S3Proxy-Jetty-54 o.g.s.o.e.j.server.HttpChannel:793 |::] handleException /data/s3proxy/data_new.compo 97 morrary/0/ java.io.IOException: com.azure.storage.blob.models.BlobStorageException: Status code 400, " RequestId:bb61fc5e-501e-005c-4cb1-91bbca000000 <Error><Code>Invaliduri</Code><Message>The requested URI does not represent any resource on the server. Time: 2025-03-10T11:42:48.60290012</Message></Error

Regards Ali

mubaraque-ali avatar Mar 10 '25 10:03 mubaraque-ali

Azure Datalake uses a different API than Azure Blob: https://learn.microsoft.com/en-us/azure/storage/blobs/data-lake-storage-directory-file-acl-java?tabs=azure-ad

S3Proxy does not support this. Adding this would be straightforward if you use the azureblob-sdk as a template: https://github.com/gaul/s3proxy/tree/master/src/main/java/org/gaul/s3proxy/azureblob

gaul avatar Mar 10 '25 15:03 gaul