filesystem_spec icon indicating copy to clipboard operation
filesystem_spec copied to clipboard

"RuntimeError: Loop is not running" when combining ZipFileSystem and DropboxDriveFileSystem

Open rabernat opened this issue 3 years ago • 8 comments

We (myself and @annieleal) are trying to read a zip file from dropbox (see https://github.com/annieleal/rces-final-project/issues/1 for more details).

We are not able to get it to work. This is what we have tried

import dropbox
import dropboxdrivefs as dbx
import fsspec
from fsspec.implementations.zip import ZipFileSystem

token = 'u19t2wI3eH4AAAAAAAAAAdplJXSWJIRd1Rp8sAl1MIcxTxz4fkJ8xP_y8dqS7sdv'
path = '/RN-20211117121141_CF2D9BA9D51D3B7CE0538486ABC0F5FD/idp2021/GEOTRACES_IDP2021_v1_seawater_netcdf.zip'

dfs = dbx.DropboxDriveFileSystem(token=token)
dbox_file = dfs.open(path)
zfs = ZipFileSystem(dbox_file)

# does the same thing - in one pass
zfs = ZipFileSystem(path, target_protocol='dropbox', target_options={'token': token})

This gives the following error

INFO:Request to files/get_temporary_link
INFO:Request to files/get_metadata
---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
/tmp/ipykernel_353/2838041811.py in <module>
     10 dfs = dbx.DropboxDriveFileSystem(token=token)
     11 dbox_file = dfs.open(path)
---> 12 zfs = ZipFileSystem(dbox_file)

~/.local/lib/python3.8/site-packages/fsspec/spec.py in __call__(cls, *args, **kwargs)
     72             return cls._cache[token]
     73         else:
---> 74             obj = super().__call__(*args, **kwargs)
     75             # Setting _fs_token here causes some static linters to complain.
     76             obj._fs_token_ = token

~/.local/lib/python3.8/site-packages/fsspec/implementations/zip.py in __init__(self, fo, mode, target_protocol, target_options, block_size, **kwargs)
     55             fo = files[0]
     56         self.fo = fo.__enter__()  # the whole instance is a context
---> 57         self.zip = zipfile.ZipFile(self.fo)
     58         self.block_size = block_size
     59         self.dir_cache = None

/srv/conda/envs/notebook/lib/python3.8/zipfile.py in __init__(self, file, mode, compression, allowZip64, compresslevel, strict_timestamps)
   1267         try:
   1268             if mode == 'r':
-> 1269                 self._RealGetContents()
   1270             elif mode in ('w', 'x'):
   1271                 # set the modified flag so central directory gets written

/srv/conda/envs/notebook/lib/python3.8/zipfile.py in _RealGetContents(self)
   1330         fp = self.fp
   1331         try:
-> 1332             endrec = _EndRecData(fp)
   1333         except OSError:
   1334             raise BadZipFile("File is not a zip file")

/srv/conda/envs/notebook/lib/python3.8/zipfile.py in _EndRecData(fpin)
    272     except OSError:
    273         return None
--> 274     data = fpin.read()
    275     if (len(data) == sizeEndCentDir and
    276         data[0:4] == stringEndArchive and

~/.local/lib/python3.8/site-packages/fsspec/implementations/http.py in read(self, length)
    496         else:
    497             length = min(self.size - self.loc, length)
--> 498         return super().read(length)
    499 
    500     async def async_fetch_all(self):

~/.local/lib/python3.8/site-packages/fsspec/spec.py in read(self, length)
   1467             # don't even bother calling fetch
   1468             return b""
-> 1469         out = self.cache._fetch(self.loc, self.loc + length)
   1470         self.loc += len(out)
   1471         return out

~/.local/lib/python3.8/site-packages/fsspec/caching.py in _fetch(self, start, end)
    374         ):
    375             # First read, or extending both before and after
--> 376             self.cache = self.fetcher(start, bend)
    377             self.start = start
    378         elif start < self.start:

~/.local/lib/python3.8/site-packages/fsspec/asyn.py in wrapper(*args, **kwargs)
     85     def wrapper(*args, **kwargs):
     86         self = obj or args[0]
---> 87         return sync(self.loop, func, *args, **kwargs)
     88 
     89     return wrapper

~/.local/lib/python3.8/site-packages/fsspec/asyn.py in sync(loop, func, timeout, *args, **kwargs)
     43     # and we will wait for it
     44     if loop is None or loop.is_closed():
---> 45         raise RuntimeError("Loop is not running")
     46     try:
     47         loop0 = grl()

RuntimeError: Loop is not running

Any ideas?

I'm not sure if this is a dropboxfs-specific problem or a broader integration issues, which is why I have posted it here first.

rabernat avatar Dec 06 '21 17:12 rabernat

cc @marinechaput

martindurant avatar Dec 06 '21 17:12 martindurant

At a guess, it seems that the latest dropbox fs predates async in fsspec. Looking now.

martindurant avatar Dec 06 '21 17:12 martindurant

Yes, I see that dropboxfs created an HTTPFile instance, and passes it a requests session rather than an initialised aiohttp one. It would be easy enough to change the upstream of dropbox's file implementation to use a requests-based file class, like fsspec.implementations.webhdfs.WebHDFile.

martindurant avatar Dec 06 '21 17:12 martindurant

Quick attempt: https://github.com/MarineChap/dropboxdrivefs/pull/5

martindurant avatar Dec 06 '21 17:12 martindurant

I don't have a dropbox account, and and tests only do simplistic mocking, so please see the PR as a starting point. If @MarineChap is not available, we could consider moving the project to the fsspec org and update the testing if one of the docker dropbox implementations suffice. It could even become an async implementation.

martindurant avatar Dec 06 '21 17:12 martindurant

Thanks for the quick PR! I just tried with pip install dropbox git+https://github.com/martindurant/dropboxdrivefs.git@file_superclass and got a different error

ValidationError: 'https://uc2ddb73545d99e18726a9a85999.dl.dropboxusercontent.com/cd/0/get/BbVqnSXnSSGyldovXy4PRQtfkl4JDI24-iKFfyWDnUxoXt6J2iOy11Wj1NdTqmfzRpTNDA4_7NiX5oGDO831o05TyDsNSpTeGJY_hbUVDV-GjoI70IwFb4b9jBlzMUENHHumiKbA3b_qLNsC6e-glO41/file' did not match pattern '(/(.|[\r\n])*|id:.*)|(rev:[0-9a-f]{9,})|(ns:[0-9]+(/.*)?)'

No dropbox account required for this example. The code above is copy-pasteable.

rabernat avatar Dec 06 '21 17:12 rabernat

Ah, you provided a token, so I can test this - sorry I didn't catch that.

That URL does seem to be valid, looking.

martindurant avatar Dec 06 '21 17:12 martindurant

Hello, I am looking at the PR now. Seems okay for me, just not clear what happened with read mode but we can discuss it on the PR. I did not look at this code since long time, so i will need to remind me what we where doing here.

By the way, if you want to move the project in the fsspec org, go ahead. It would make more sense.

MarineChap avatar Dec 06 '21 18:12 MarineChap