gcsfs icon indicating copy to clipboard operation
gcsfs copied to clipboard

FileNotFoundError when using pandas df.to_excel

Open mosqueteiro opened this issue 5 years ago • 8 comments

Similar to #184

When trying to write an excel file with pandas to a cloud bucket I get these errors:

df.to_excel('gs://bucket_name/path/to/data/test_excel.xlsx')
...
FileNotFoundError: [Errno 2] No such file or directory: 'gs://bucket_name/path/to/data/test_excel.xlsx'
...
FileCreateError: [Errno 2] No such file or directory: 'gs://bucket_name/path/to/data/test_excel.xlsx'

I'm able to get around this right now by writing to a BytesIO first and then uploading

import io
output = io.BytesIO()
df.to_excel(output)
with fs.open('path/to/data/test_excel.xlsx') as f:
   output.seek(0)
   f.write(output.read())

Is pandas integration with respect to writing excel files not implemented yet?

pandas = 0.24.2 gcsfs = 0.3.1

mosqueteiro avatar Nov 13 '19 00:11 mosqueteiro

Since the write to the path works, I suspect that there is something amiss in pandas, perhaps trying to query the directory (which doesn't exist, and on GCS is not a prerequisite).

martindurant avatar Nov 13 '19 14:11 martindurant

Agreed that this is most likely on the pandas side.

On Wed, Nov 13, 2019 at 8:44 AM Martin Durant [email protected] wrote:

Since the write to the path works, I suspect that there is something amiss in pandas, perhaps trying to query the directory (which doesn't exist, and on GCS is not a prerequisite).

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/gcsfs/issues/201?email_source=notifications&email_token=AAKAOITENURWKVPF3H5SS3LQTQHFRA5CNFSM4JMMDZZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED6LXVY#issuecomment-553434071, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOISTBLUZRIIWKX7JKADQTQHFRANCNFSM4JMMDZZA .

TomAugspurger avatar Nov 13 '19 14:11 TomAugspurger

This error happened when I create the gcsfs file system first, and modify/reload/.. files.

After the file was modified, when we use the same gcsfs file system, it will raise the FileNotFoundError.

This error existed in both of Pandas and Dask,

One solution I found out is to set "cache_timeout=0" when creating the gcsfs file system.

baitsnyc avatar Nov 19 '19 16:11 baitsnyc

That is a reasonable workaround, and again comes back to how persistent we would like the directory listings cache to be...

martindurant avatar Nov 19 '19 16:11 martindurant

Personally, I'm continually bitten by the caching behavior and just turn it off completely. In any kind of distributed system there will quite often be updates to GCS that are performed by other processes.

JohnEmhoff avatar Nov 24 '19 12:11 JohnEmhoff

Same here. Was running into this one once again last week - forgot that I solved problems by deactivating the cache some time ago. 😉 I am using gcsfs in a micro-service context. Having a cache of a bucket's state can lead to interesting results.

aberres avatar Dec 15 '19 20:12 aberres

Might not be related to OP's issue of saving to cloud bucket but I encountered

"FileCreateError: [Errno 2] No such file or directory:"

when using df.to_excel to save the file to onedrive locally. My issue was that the path with file name was simply too long. Shortening the file name helped. Otherwise saving to another folder with a shorter path might help too.

nhakim avatar Feb 08 '22 04:02 nhakim

@nhakim , I think I am confused - did you mean to say OneDrive? This repo does not concern itself with that, but it would be cool if there was an fsspec implementation somewhere.

martindurant avatar Feb 08 '22 14:02 martindurant