s3fs icon indicating copy to clipboard operation
s3fs copied to clipboard

readinto() fails

Open philtromans opened this issue 4 years ago • 3 comments

Hi,

It looks like the buffer argument that's passed in isn't being passed into the underlying readinto() call. This causes a TypeError when Python tries to invoke the function: https://github.com/PyFilesystem/s3fs/blob/7f95af69e904e58b068ac4f6b39c2da488d5d918/fs_s3fs/_s3fs.py#L154

(this gets exposed when you're using Pickle protocol 5, and right now it fails)

Thanks!

philtromans avatar Oct 14 '20 09:10 philtromans

Hi,

I can confirm the problem. A minimal example for reproducing this bug would be to run the following in Python 3.8:

import pickle
import datetime as dt

from fs_s3fs import S3FS


timestamp = dt.datetime(2020, 11, 20)

filesystem = S3FS(
    bucket_name='<your bucket>',
    dir_path='<your path>'
)

with filesystem.openbin('timestamp.pkl', 'w') as fp:
    pickle.dump(timestamp, fp, protocol=3)  # Saves with lower pickle protocol version
    
with filesystem.openbin('timestamp.pkl', 'r') as fp:
    pickle.load(fp)

A more natural way for obtaining this bug is that someone pickles a certain object to an s3 bucket using Python 3.7 and then someone else tries to load it using Python 3.8.

A stack trace for the error raised by the code above is:

TypeError                                 Traceback (most recent call last)
<ipython-input-1-983760a01c4f> in <module>
     16 
     17 with filesystem.openbin('timestamp.pkl', 'r') as fp:
---> 18     pickle.load(fp)

~/.local/share/virtualenvs/my-venv/lib/python3.8/site-packages/fs_s3fs/_s3fs.py in readinto(self, b)
    152 
    153     def readinto(self, b):
--> 154         return self._f.readinto()
    155 
    156     def write(self, b):

TypeError: readinto() takes exactly one argument (0 given)

The attribute self._f is of type io.BufferedRandom. According to documentation of io module, method readinto always requires a parameter b (i.e. bytes to read). I suspect that this bug in S3File.readinto hasn't been discovered before simply because the method gets called only in very rare cases.

According to this the solution of the bug seems to be passing the parameter b:

return self._f.readinto() -> return self._f.readinto(b)

This solves the problem for the above example.

@willmcgugan could this be fixed? Let us know if you need any help.

AleksMat avatar Nov 20 '20 16:11 AleksMat

Yes, definitely a mistake. Happy to accept a PR.

willmcgugan avatar Nov 20 '20 16:11 willmcgugan

@willmcgugan Any chance this could be merged and a new version released?

batic avatar Oct 21 '21 13:10 batic