smart_open icon indicating copy to clipboard operation
smart_open copied to clipboard

Uploads are unintentionally closed by the __del__ destructor

Open Limess opened this issue 3 years ago • 1 comments

Problem description

  • Scenario: Writing to S3 using multipart uploads
  • Expected: Uploads are not committed to S3 when the process exits
  • Actual behaviour: Uploads are committed when the process exits

Steps/code to reproduce the problem

It seems that the __del__ finalizer is being called on both:

  • io.BufferedIOBase which MultipartWriter extends from, calling close
  • GzipFile when the process exits, which in turn calls close MultipartWriter due to the close method being monkeypatched when these objects are garbage collected, e.g. when the system exits

This does not seem to be intended behaviour when using multipart uploads, it's not desirable to complete multipart uploads in the middle of the upload when the process exits elsewhere, e.g. on EC2 when we get a spot instance termination, and has led to us seeing incomplete data, and corrupted gzip uploads, being written to S3 when other issues cause the system to exit.

For other uploads, this behaviour is OK as it means the files are closed as expected- writes aren't intended to be atomic.

Ideally close should never be called on MultipartWriter, unless a context manager exits without an exception, or if close is explicitly called by the user code.

Versions

>>> import platform, sys, smart_open
m.platform())
print("Python", sys.version)
print("smart_open", smart_open.__version__)>>> print(platform.platform())
Linux-5.10.60.1-microsoft-standard-WSL2-x86_64-with-debian-bullseye-sid
>>> print("Python", sys.version)
Python 3.7.11 (default, Jan 13 2022, 14:48:06)
[GCC 9.3.0]
>>> print("smart_open", smart_open.__version__)
smart_open 5.2.1

Limess avatar Jan 17 '22 12:01 Limess

We were trying to get around this by monkeypatching the file object to remove the __del__ method from both the gzip and smart open s3 file objects in our user code, but didn't have any success without monkeypatching the whole classmethod on gzip.GzipFile which is undesirable as it seems there's special behaviour with python dundermethods, there may be a way around this though.

Limess avatar Jan 17 '22 12:01 Limess