adlfs icon indicating copy to clipboard operation
adlfs copied to clipboard

Consistent handling of paths containing protocols

Open Tom-Newton opened this issue 1 year ago • 0 comments

For my usecase I generally have a full path including the protocol like abfss://<container-name>@<storage-account-name>.dfs.core.windows.net/<path-within-container>.

I've found that different methods of AzureBlobFileSystem seem to handle the protocol information in different ways. Some examples:

  1. ls: Doesn't work when provided a full path including the protocol.
  2. glob: Strips the protocol information then returns paths of the form <container-name>/<path-within-container>. Personally I would find it a lot more helpful if it returned paths using the same format that I provided which included the protocol.
  3. rm: Works as I would expect. It strips off the protocol information to operate but it doesn't return a path so it doesn't have the same issue that glob has.

The reason I like to use fully qualified paths with the protocol information is that it allows interacting with a local file system or blob storage with exactly the same code. The only thing I need to change is the path that I provide.

I will probably implement some kind of wrapper around AzureBlobFileSystem as a workaround myself but personally I think it would be best to resolve this at source.

My opinion on how it should ideally work:

  1. All methods should accept full paths with the protocol information.
  2. Methods with return paths e.g. glob and ls should return the same format that was received.

Tom-Newton avatar Jan 09 '23 15:01 Tom-Newton