filesystem_spec
filesystem_spec copied to clipboard
Seeking on Async FS is bugged / not-working
Here's a minimal example:
import fsspec
import asyncio
async def async_version():
print("Async Version")
fs = fsspec.filesystem("http", asynchronous=True)
session = await fs.set_session()
file = await fs.open_async("https://example.com/")
print("Starting Tell", file.tell(), "seeking to 20")
file.seek(20)
print(f"Read 5 bytes, from tell of {file.tell()}:", await file.read(5), "now tell:", file.tell())
file.seek(20)
print(f"Read 5 bytes, from tell of {file.tell()}:", await file.read(5), "now tell:", file.tell())
await file.close()
await session.close()
def sync_version():
print("Sync Version")
fs = fsspec.filesystem("http")
file = fs.open("https://example.com/")
print("Starting Tell", file.tell(), "seeking to 20")
file.seek(20)
print(f"Read 5 bytes, from tell of {file.tell()}:", file.read(5), "now tell:", file.tell())
file.seek(20)
print(f"Read 5 bytes, from tell of {file.tell()}:", file.read(5), "now tell:", file.tell())
file.close()
if __name__ == '__main__':
asyncio.run(async_version())
sync_version()
This outputs
Async Version
Starting Tell 0 seeking to 20
Read 5 bytes, from tell of 20: b'<!doc' now tell: 25
Read 5 bytes, from tell of 20: b'type ' now tell: 25
Sync Version
Starting Tell 0 seeking to 20
Read 5 bytes, from tell of 20: b'l>\n<h' now tell: 25
Read 5 bytes, from tell of 20: b'l>\n<h' now tell: 25
Note the async version, while respecting seek and tell, and even updating the .loc after a read, so updated .tell works in terms of describing the .loc, but the actual bytes that .read is operating on are wrong.
The document starts <!doctype so we can see that the two .read() are just reading sequentially, and the seek operation in the async implementation is not affecting the returned bytes (despite updating the .loc)
I actually originally found this via a s3 filesystem with cache_type='background', but as I removed things I eventually got all the way down to pure http and found it still is not working.