gcsfs
gcsfs copied to clipboard
Getting 'GS' key error when reading a csv from GCS using gcsfc
Hi I upgraded gcsfs and now I get the following error:
My code is pretty simple:
data = dd.read_csv(file_path, parse_dates=[date_column])\
.compute()
return data```
It used to work but all of a sudden it stopped working.
file_path = gs://mybuck/res.csv
```File "main.py", line 51, in run
data = load_parse_file(file_path=args.input_file)
File "/FbProphet/prophet_gcp/utils.py", line 15, in load_parse_file
data = dd.read_csv(file_path, parse_dates=[date_column])\
File "/work/miniconda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 578, in read
**kwargs
File "/work/miniconda/lib/python3.7/site-packages/dask/dataframe/io/csv.py", line 405, in read_pandas
**(storage_options or {})
File "/work/miniconda/lib/python3.7/site-packages/dask/bytes/core.py", line 93, in read_bytes
fs, fs_token, paths = get_fs_token_paths(urlpath, mode="rb", storage_options=kwargs)
File "/work/miniconda/lib/python3.7/site-packages/dask/bytes/core.py", line 425, in get_fs_token_paths
fs, fs_token = get_fs(protocol, options)
File "/work/miniconda/lib/python3.7/site-packages/dask/bytes/core.py", line 571, in get_fs
cls = _filesystems[protocol]
KeyError: 'gs'
Ah yes, sorry - my fault. For now, you can replace "gs" with "gcs".
I have tried both, still get the same issue
Hm, actually on second thoughts, you are not using the new code at all.
I don't know why you are seeing this, there has been no change in dask (master) or gcsfs (release) yet. Can you show the contents of dask.bytes.code._filesystems
, try import gcsfs
explicitly, or run dask.bytes.core.get_fs('gs')
?
Of course, the workaround for you may be simply to downgrade gcsfs until we have completed the transition to fsspec (which is the reason for a little turbulence right now).
seeing the same behavior w/ 0.3.0:
[ins] In [2]: dd.read_csv('gs://gcp-public-data-landsat/index.csv.gz', compression=
...: 'gzip')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-2-2c32b0849045> in <module>
----> 1 dd.read_csv('gs://gcp-public-data-landsat/index.csv.gz', compression='gzip')
~/venvs/model/lib/python3.7/site-packages/dask/dataframe/io/csv.py in read(urlpath, blocksize, collection, lineterminator, compression, sample, enforce, assume_missing, storage_options, include_path_column, **kwargs)
576 storage_options=storage_options,
577 include_path_column=include_path_column,
--> 578 **kwargs
579 )
580
~/venvs/model/lib/python3.7/site-packages/dask/dataframe/io/csv.py in read_pandas(reader, urlpath, blocksize, collection, lineterminator, compression, sample, enforce, assume_missing, storage_options, include_path_column, **kwargs)
403 compression=compression,
404 include_path=include_path_column,
--> 405 **(storage_options or {})
406 )
407
~/venvs/model/lib/python3.7/site-packages/dask/bytes/core.py in read_bytes(urlpath, delimiter, not_zero, blocksize, sample, compression, include_path, **kwargs)
91
92 """
---> 93 fs, fs_token, paths = get_fs_token_paths(urlpath, mode="rb", storage_options=kwargs)
94
95 if len(paths) == 0:
~/venvs/model/lib/python3.7/site-packages/dask/bytes/core.py in get_fs_token_paths(urlpath, mode, num, name_function, storage_options)
423 update_storage_options(options, storage_options)
424
--> 425 fs, fs_token = get_fs(protocol, options)
426
427 if "w" in mode:
~/venvs/model/lib/python3.7/site-packages/dask/bytes/core.py in get_fs(protocol, storage_options)
569 " pip install gcsfs",
570 )
--> 571 cls = _filesystems[protocol]
572
573 elif protocol in ["adl", "adlfs"]:
KeyError: 'gs'
re: the q's you asked above
[nav] In [7]: import dask.bytes.core
...: dask.bytes.core._filesystems
...:
Out[7]: {'file': dask.bytes.local.LocalFileSystem}
[nav] In [9]: import dask.bytes.core
...: dask.bytes.core.get_fs('gs')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
<ipython-input-9-855d81a61db6> in <module>
1 import dask.bytes.core
----> 2 dask.bytes.core.get_fs('gs')
~/venvs/model/lib/python3.7/site-packages/dask/bytes/core.py in get_fs(protocol, storage_options)
569 " pip install gcsfs",
570 )
--> 571 cls = _filesystems[protocol]
572
573 elif protocol in ["adl", "adlfs"]:
KeyError: 'gs'
I'm afraid you need to use the master version of dask to pick this up, following https://github.com/dask/dask/pull/5064
sg, seems like this is resolved then?
`C:\Anaconda3\lib\site-packages\dask\dataframe\io\csv.py in read(urlpath, blocksize, collection, lineterminator, compression, sample, enforce, assume_missing, storage_options, include_path_column, **kwargs)
576 storage_options=storage_options,
577 include_path_column=include_path_column,
--> 578 **kwargs
579 )
580
C:\Anaconda3\lib\site-packages\dask\dataframe\io\csv.py in read_pandas(reader, urlpath, blocksize, collection, lineterminator, compression, sample, enforce, assume_missing, storage_options, include_path_column, **kwargs)
403 compression=compression,
404 include_path=include_path_column,
--> 405 **(storage_options or {})
406 )
407
C:\Anaconda3\lib\site-packages\dask\bytes\core.py in read_bytes(urlpath, delimiter, not_zero, blocksize, sample, compression, include_path, **kwargs)
91
92 """
---> 93 fs, fs_token, paths = get_fs_token_paths(urlpath, mode="rb", storage_options=kwargs)
94
95 if len(paths) == 0:
C:\Anaconda3\lib\site-packages\dask\bytes\core.py in get_fs_token_paths(urlpath, mode, num, name_function, storage_options)
423 update_storage_options(options, storage_options)
424
--> 425 fs, fs_token = get_fs(protocol, options)
426
427 if "w" in mode:
C:\Anaconda3\lib\site-packages\dask\bytes\core.py in get_fs(protocol, storage_options)
569 " pip install gcsfs",
570 )
--> 571 cls = _filesystems[protocol]
572
573 elif protocol in ["adl", "adlfs"]:
KeyError: 'gcs'
Have the same issue now dask 2.1.0 py_0 dask-core 2.1.0 py_0 gcsfs 0.3.0 py_0 conda-forge
@PoradaKev - you a version of gcsfs that is too new for Dask. Either downgrade, or install Dask from master.
Tried that dask==2.1.0
and gcsfs==0.2.3
would work.