filesystem_spec bugfix: await on size, assuming it can be an async function

bugfix: await on size, assuming it can be an async function

Open mukhery opened this issue 1 year ago • 6 comments

Checking size may require a call to the async _info function. On line 277, _cp_file checks if f1.size is async (callback.set_size(await maybe_await(f1.size))), but then later it simply references f1.size which fails if this function is async. As such, this PR corrects the code to ensure f1.size references are wrapped in await maybe_await().

May 29 '23 16:05 mukhery

@martindurant what do you think?

Jun 01 '23 13:06 mukhery

I think you are right in your argument. However, I have a feeling that f.size is never async actually, so it won't matter. open_async/AsyncStreamedFiles could use work!

Jun 01 '23 13:06 martindurant

I think you are right in your argument. However, I have a feeling that f.size is never async actually, so it won't matter. open_async/AsyncStreamedFiles could use work!

https://github.com/fsspec/s3fs/pull/742 makes it async in order to call the async _info needed for _cp_file

Jun 01 '23 13:06 mukhery

Funny, because S3 is the only one where for sure we should know the size before starting the download: the content-size header is always populated and, possibly outside compressive transcoding, will be correct.

Jun 01 '23 13:06 martindurant

Funny, because S3 is the only one where for sure we should know the size before starting the download: the content-size header is always populated and, possibly outside compressive transcoding, will be correct.

Perhaps I should change https://github.com/fsspec/filesystem_spec/blob/master/fsspec/generic.py#L277 to just remove the assumption that f1.size could be async? Or do you think this still might be useful for other implementations even though it isn't an issue for s3?

Jun 10 '23 15:06 mukhery

maybe_await should never be harmful and not add any overhead

Jun 10 '23 17:06 martindurant

filesystem_spec filesystem_spec copied to clipboard

bugfix: await on size, assuming it can be an async function

filesystem_spec
filesystem_spec copied to clipboard