smart_open
smart_open copied to clipboard
Uploads are unintentionally closed by the __del__ destructor
Problem description
- Scenario: Writing to S3 using multipart uploads
- Expected: Uploads are not committed to S3 when the process exits
- Actual behaviour: Uploads are committed when the process exits
Steps/code to reproduce the problem
It seems that the __del__ finalizer is being called on both:
io.BufferedIOBasewhichMultipartWriterextends from, callingcloseGzipFilewhen the process exits, which in turn callscloseMultipartWriterdue to theclosemethod being monkeypatched when these objects are garbage collected, e.g. when the system exits
This does not seem to be intended behaviour when using multipart uploads, it's not desirable to complete multipart uploads in the middle of the upload when the process exits elsewhere, e.g. on EC2 when we get a spot instance termination, and has led to us seeing incomplete data, and corrupted gzip uploads, being written to S3 when other issues cause the system to exit.
For other uploads, this behaviour is OK as it means the files are closed as expected- writes aren't intended to be atomic.
Ideally close should never be called on MultipartWriter, unless a context manager exits without an exception, or if close is explicitly called by the user code.
Versions
>>> import platform, sys, smart_open
m.platform())
print("Python", sys.version)
print("smart_open", smart_open.__version__)>>> print(platform.platform())
Linux-5.10.60.1-microsoft-standard-WSL2-x86_64-with-debian-bullseye-sid
>>> print("Python", sys.version)
Python 3.7.11 (default, Jan 13 2022, 14:48:06)
[GCC 9.3.0]
>>> print("smart_open", smart_open.__version__)
smart_open 5.2.1
We were trying to get around this by monkeypatching the file object to remove the __del__ method from both the gzip and smart open s3 file objects in our user code, but didn't have any success without monkeypatching the whole classmethod on gzip.GzipFile which is undesirable as it seems there's special behaviour with python dundermethods, there may be a way around this though.