filesystem_spec
filesystem_spec copied to clipboard
"RuntimeError: Loop is not running" when combining ZipFileSystem and DropboxDriveFileSystem
We (myself and @annieleal) are trying to read a zip file from dropbox (see https://github.com/annieleal/rces-final-project/issues/1 for more details).
We are not able to get it to work. This is what we have tried
import dropbox
import dropboxdrivefs as dbx
import fsspec
from fsspec.implementations.zip import ZipFileSystem
token = 'u19t2wI3eH4AAAAAAAAAAdplJXSWJIRd1Rp8sAl1MIcxTxz4fkJ8xP_y8dqS7sdv'
path = '/RN-20211117121141_CF2D9BA9D51D3B7CE0538486ABC0F5FD/idp2021/GEOTRACES_IDP2021_v1_seawater_netcdf.zip'
dfs = dbx.DropboxDriveFileSystem(token=token)
dbox_file = dfs.open(path)
zfs = ZipFileSystem(dbox_file)
# does the same thing - in one pass
zfs = ZipFileSystem(path, target_protocol='dropbox', target_options={'token': token})
This gives the following error
INFO:Request to files/get_temporary_link
INFO:Request to files/get_metadata
---------------------------------------------------------------------------
RuntimeError Traceback (most recent call last)
/tmp/ipykernel_353/2838041811.py in <module>
10 dfs = dbx.DropboxDriveFileSystem(token=token)
11 dbox_file = dfs.open(path)
---> 12 zfs = ZipFileSystem(dbox_file)
~/.local/lib/python3.8/site-packages/fsspec/spec.py in __call__(cls, *args, **kwargs)
72 return cls._cache[token]
73 else:
---> 74 obj = super().__call__(*args, **kwargs)
75 # Setting _fs_token here causes some static linters to complain.
76 obj._fs_token_ = token
~/.local/lib/python3.8/site-packages/fsspec/implementations/zip.py in __init__(self, fo, mode, target_protocol, target_options, block_size, **kwargs)
55 fo = files[0]
56 self.fo = fo.__enter__() # the whole instance is a context
---> 57 self.zip = zipfile.ZipFile(self.fo)
58 self.block_size = block_size
59 self.dir_cache = None
/srv/conda/envs/notebook/lib/python3.8/zipfile.py in __init__(self, file, mode, compression, allowZip64, compresslevel, strict_timestamps)
1267 try:
1268 if mode == 'r':
-> 1269 self._RealGetContents()
1270 elif mode in ('w', 'x'):
1271 # set the modified flag so central directory gets written
/srv/conda/envs/notebook/lib/python3.8/zipfile.py in _RealGetContents(self)
1330 fp = self.fp
1331 try:
-> 1332 endrec = _EndRecData(fp)
1333 except OSError:
1334 raise BadZipFile("File is not a zip file")
/srv/conda/envs/notebook/lib/python3.8/zipfile.py in _EndRecData(fpin)
272 except OSError:
273 return None
--> 274 data = fpin.read()
275 if (len(data) == sizeEndCentDir and
276 data[0:4] == stringEndArchive and
~/.local/lib/python3.8/site-packages/fsspec/implementations/http.py in read(self, length)
496 else:
497 length = min(self.size - self.loc, length)
--> 498 return super().read(length)
499
500 async def async_fetch_all(self):
~/.local/lib/python3.8/site-packages/fsspec/spec.py in read(self, length)
1467 # don't even bother calling fetch
1468 return b""
-> 1469 out = self.cache._fetch(self.loc, self.loc + length)
1470 self.loc += len(out)
1471 return out
~/.local/lib/python3.8/site-packages/fsspec/caching.py in _fetch(self, start, end)
374 ):
375 # First read, or extending both before and after
--> 376 self.cache = self.fetcher(start, bend)
377 self.start = start
378 elif start < self.start:
~/.local/lib/python3.8/site-packages/fsspec/asyn.py in wrapper(*args, **kwargs)
85 def wrapper(*args, **kwargs):
86 self = obj or args[0]
---> 87 return sync(self.loop, func, *args, **kwargs)
88
89 return wrapper
~/.local/lib/python3.8/site-packages/fsspec/asyn.py in sync(loop, func, timeout, *args, **kwargs)
43 # and we will wait for it
44 if loop is None or loop.is_closed():
---> 45 raise RuntimeError("Loop is not running")
46 try:
47 loop0 = grl()
RuntimeError: Loop is not running
Any ideas?
I'm not sure if this is a dropboxfs-specific problem or a broader integration issues, which is why I have posted it here first.
cc @marinechaput
At a guess, it seems that the latest dropbox fs predates async in fsspec. Looking now.
Yes, I see that dropboxfs created an HTTPFile instance, and passes it a requests session rather than an initialised aiohttp one. It would be easy enough to change the upstream of dropbox's file implementation to use a requests-based file class, like fsspec.implementations.webhdfs.WebHDFile.
Quick attempt: https://github.com/MarineChap/dropboxdrivefs/pull/5
I don't have a dropbox account, and and tests only do simplistic mocking, so please see the PR as a starting point. If @MarineChap is not available, we could consider moving the project to the fsspec org and update the testing if one of the docker dropbox implementations suffice. It could even become an async implementation.
Thanks for the quick PR! I just tried with pip install dropbox git+https://github.com/martindurant/dropboxdrivefs.git@file_superclass and got a different error
ValidationError: 'https://uc2ddb73545d99e18726a9a85999.dl.dropboxusercontent.com/cd/0/get/BbVqnSXnSSGyldovXy4PRQtfkl4JDI24-iKFfyWDnUxoXt6J2iOy11Wj1NdTqmfzRpTNDA4_7NiX5oGDO831o05TyDsNSpTeGJY_hbUVDV-GjoI70IwFb4b9jBlzMUENHHumiKbA3b_qLNsC6e-glO41/file' did not match pattern '(/(.|[\r\n])*|id:.*)|(rev:[0-9a-f]{9,})|(ns:[0-9]+(/.*)?)'
No dropbox account required for this example. The code above is copy-pasteable.
Ah, you provided a token, so I can test this - sorry I didn't catch that.
That URL does seem to be valid, looking.
Hello, I am looking at the PR now. Seems okay for me, just not clear what happened with read mode but we can discuss it on the PR. I did not look at this code since long time, so i will need to remind me what we where doing here.
By the way, if you want to move the project in the fsspec org, go ahead. It would make more sense.