iceberg-rust icon indicating copy to clipboard operation
iceberg-rust copied to clipboard

Can we trim the storage scheme after parsing?

Open Xuanwo opened this issue 6 months ago • 6 comments

One question I have is whether we really need to keep the original abfss scheme. I don't recall any situation where we need to use it, perhaps only when creating a table? I wonder if we could simply remove it after parsing the input path.

Originally posted by @Xuanwo in https://github.com/apache/iceberg-rust/issues/1368#issuecomment-2944696445

Xuanwo avatar Jun 05 '25 14:06 Xuanwo

Right now, it's only used to validate the passed path's scheme against the scheme of the endpoint that might have been used to configure the FileIO. We don't really distinguish between wasb[s] and abfs[s], only between their TLS and non-TLS versions.

If we say that a user should also be able use a TLS endpoint via abfss:// even though they might have passed a plain text http://account.dfs.core.windows.net to the FileIO, then we can drop it!

DerGut avatar Jun 05 '25 15:06 DerGut

I see a little bit of value in forcing all requests to use TLS even if some path may specify the plain text variant. Especially when users use SAS token-based auth.

DerGut avatar Jun 05 '25 16:06 DerGut

I see a little bit of value in forcing all requests to use TLS even if some path may specify the plain text variant. Especially when users use SAS token-based auth.

Aha 😆 , the biggest blocker are from users of minio and azurite.

Xuanwo avatar Jun 05 '25 16:06 Xuanwo

the biggest blocker are from users of minio and azurite

Yeah, as far as I can see, that's the main value of allowing plain text at all. But IIUC neither Azurite, nor Minio support the ADLS APIs 🤔

Browsing the Azure console, I can only force TLS, but not configure the storage account to only allow plain text. So local testing maybe is the only reason to support plain text.

I guess it still makes sense to support it in case someone builds their own local emulation...

DerGut avatar Jun 05 '25 17:06 DerGut

One question I have is whether we really need to keep the original abfss scheme

Wouldn't we need to keep this to support existing tables that have files with this scheme?

mrcnc avatar Jun 05 '25 20:06 mrcnc

I think it's valuable to keep it to validate against input, for example when the FileIO was created with hdfs, while the user trying to create an "s3" file with it. It may not be a problem if we have finished https://github.com/apache/iceberg-rust/issues/1314

liurenjie1024 avatar Jun 06 '25 09:06 liurenjie1024