gcsfs
gcsfs copied to clipboard
FileNotFoundError when using pandas df.to_excel
Similar to #184
When trying to write an excel file with pandas to a cloud bucket I get these errors:
df.to_excel('gs://bucket_name/path/to/data/test_excel.xlsx')
...
FileNotFoundError: [Errno 2] No such file or directory: 'gs://bucket_name/path/to/data/test_excel.xlsx'
...
FileCreateError: [Errno 2] No such file or directory: 'gs://bucket_name/path/to/data/test_excel.xlsx'
I'm able to get around this right now by writing to a BytesIO
first and then uploading
import io
output = io.BytesIO()
df.to_excel(output)
with fs.open('path/to/data/test_excel.xlsx') as f:
output.seek(0)
f.write(output.read())
Is pandas integration with respect to writing excel files not implemented yet?
pandas = 0.24.2 gcsfs = 0.3.1
Since the write to the path works, I suspect that there is something amiss in pandas, perhaps trying to query the directory (which doesn't exist, and on GCS is not a prerequisite).
Agreed that this is most likely on the pandas side.
On Wed, Nov 13, 2019 at 8:44 AM Martin Durant [email protected] wrote:
Since the write to the path works, I suspect that there is something amiss in pandas, perhaps trying to query the directory (which doesn't exist, and on GCS is not a prerequisite).
— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/dask/gcsfs/issues/201?email_source=notifications&email_token=AAKAOITENURWKVPF3H5SS3LQTQHFRA5CNFSM4JMMDZZKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOED6LXVY#issuecomment-553434071, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAKAOISTBLUZRIIWKX7JKADQTQHFRANCNFSM4JMMDZZA .
This error happened when I create the gcsfs file system first, and modify/reload/.. files.
After the file was modified, when we use the same gcsfs file system, it will raise the FileNotFoundError.
This error existed in both of Pandas and Dask,
One solution I found out is to set "cache_timeout=0" when creating the gcsfs file system.
That is a reasonable workaround, and again comes back to how persistent we would like the directory listings cache to be...
Personally, I'm continually bitten by the caching behavior and just turn it off completely. In any kind of distributed system there will quite often be updates to GCS that are performed by other processes.
Same here. Was running into this one once again last week - forgot that I solved problems by deactivating the cache some time ago. 😉 I am using gcsfs in a micro-service context. Having a cache of a bucket's state can lead to interesting results.
Might not be related to OP's issue of saving to cloud bucket but I encountered
"FileCreateError: [Errno 2] No such file or directory:"
when using df.to_excel to save the file to onedrive locally. My issue was that the path with file name was simply too long. Shortening the file name helped. Otherwise saving to another folder with a shorter path might help too.
@nhakim , I think I am confused - did you mean to say OneDrive? This repo does not concern itself with that, but it would be cool if there was an fsspec implementation somewhere.