filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

Is there a way to map to pyarrow.hdfs.connect?

Open Jeffwan opened this issue 2 years ago • 4 comments

Let's say we have a file system called bfs:// which is an equivalent implementation of HDFS. We can support pyarrow operation like following way.

import pyarrow as pa
from pyarrow import fs
bfs = pa.hdfs.connect('bfs://service-endpoint')             # --->  work
# bfs, _ = fs.FileSystem.from_uri('bfs://service-endpoint') # ---> doesn't work
local, _ = fs.FileSystem.from_uri('file:///')
with local.open_input_stream('/tmp/license.txt') as src:
    with bfs.open('/tmp-test.txt', mode='wb') as dest:
        dest.upload(src)

Could I know whether fsspec can support our protocol?

Jeffwan avatar Nov 12 '22 08:11 Jeffwan

Could you please explain how you'd like your alternative implementation to be handled by fsspec? I am understanding, that you would like the "bfs" protocol to be registered with fsspec, and have it create the arrow fs, wrap it, and return an fsspec instance. Is this right?

martindurant avatar Nov 12 '22 18:11 martindurant

@martindurant Yes. That's exact what I want. Do you have any suggestion for those custom protocols? Should we do it downstream or upstream?

Jeffwan avatar Nov 14 '22 05:11 Jeffwan

Could you please try:

fsspec.implementations.arrow.HadoopFileSystem("bfs://service-endpoint")

to see if this does this right thing?

If so, the following change

--- a/fsspec/implementations/arrow.py
+++ b/fsspec/implementations/arrow.py
@@ -260,6 +260,8 @@ class HadoopFileSystem(ArrowFSWrapper):
         out = {}
         if ops.get("host", None):
             out["host"] = ops["host"]
+            if ops.get("protocol"):
+                out["host"] = f"{ops.get('protocol')}://{out['host']}"
         if ops.get("username", None):
             out["user"] = ops["username"]

should allow for

fsspec.register_implementation("bfs", fsspec.implementations.arrow.HadoopFileSystem)
with fsspec.open("bfs://service-endpoint/tmp/license.txt") as f:
    use_somehow(f)

martindurant avatar Nov 14 '22 16:11 martindurant

@Jeffwan , did my proposed solution work for you?

martindurant avatar Dec 08 '22 15:12 martindurant