filesystem_spec
filesystem_spec copied to clipboard
Is there a way to map to pyarrow.hdfs.connect?
Let's say we have a file system called bfs:// which is an equivalent implementation of HDFS. We can support pyarrow operation like following way.
import pyarrow as pa
from pyarrow import fs
bfs = pa.hdfs.connect('bfs://service-endpoint') # ---> work
# bfs, _ = fs.FileSystem.from_uri('bfs://service-endpoint') # ---> doesn't work
local, _ = fs.FileSystem.from_uri('file:///')
with local.open_input_stream('/tmp/license.txt') as src:
with bfs.open('/tmp-test.txt', mode='wb') as dest:
dest.upload(src)
Could I know whether fsspec can support our protocol?
Could you please explain how you'd like your alternative implementation to be handled by fsspec? I am understanding, that you would like the "bfs" protocol to be registered with fsspec, and have it create the arrow fs, wrap it, and return an fsspec instance. Is this right?
@martindurant Yes. That's exact what I want. Do you have any suggestion for those custom protocols? Should we do it downstream or upstream?
Could you please try:
fsspec.implementations.arrow.HadoopFileSystem("bfs://service-endpoint")
to see if this does this right thing?
If so, the following change
--- a/fsspec/implementations/arrow.py
+++ b/fsspec/implementations/arrow.py
@@ -260,6 +260,8 @@ class HadoopFileSystem(ArrowFSWrapper):
out = {}
if ops.get("host", None):
out["host"] = ops["host"]
+ if ops.get("protocol"):
+ out["host"] = f"{ops.get('protocol')}://{out['host']}"
if ops.get("username", None):
out["user"] = ops["username"]
should allow for
fsspec.register_implementation("bfs", fsspec.implementations.arrow.HadoopFileSystem)
with fsspec.open("bfs://service-endpoint/tmp/license.txt") as f:
use_somehow(f)
@Jeffwan , did my proposed solution work for you?