smart_open
smart_open copied to clipboard
ResourceWarning: unclosed file
Problem description
I'm seeing a ResourceWarning: unclosed file
warning when using context managers to open files/streams with smart_open
.
Note: if I don't run under unittest
, or if I don't gzip
the file, resources seem to be closed correctly. I'm guessing unittest
and smart_open
are somehow not coordinating correctly on closing the layers.
Steps/code to reproduce the problem
My test script:
import unittest
from smart_open import open as smart_open
class MyTestCase(unittest.TestCase):
def test_load(self):
with smart_open('input.csv.gz') as fh:
print("opened file")
unittest.main()
Invocation:
% echo -e 'col1,col2\nval1,val2\nval3,val4' | gzip > input.csv.gz
% PYTHONTRACEMALLOC=1 python test_load.py
opened file
/Users/kwilliams/miniconda3/lib/python3.7/unittest/case.py:615: ResourceWarning: unclosed file <_io.BufferedReader name='input.csv.gz'>
testMethod()
Object allocated at (most recent call last):
File "/Users/kwilliams/git/dispatcher/rush-springs-simulations/venv/lib/python3.7/site-packages/smart_open/smart_open_lib.py", lineno 548
fobj = io.open(parsed_uri.uri_path, mode)
.
----------------------------------------------------------------------
Ran 1 test in 0.007s
OK
Versions
Darwin-18.0.0-x86_64-i386-64bit
Python 3.7.3 (default, Mar 27 2019, 16:54:48)
[Clang 4.0.1 (tags/RELEASE_401/final)]
smart_open 1.9.0
I should add - I'm not sure whether the warning is correct and the filehandle isn't being closed properly, or it's a spurious warning.
Hi, any thoughts on this?
@kenahoo thanks for the clear and detailed report. I agree context managers should be closing handles, so that looks like a bug.
@mpenkov is busy ATM – any chance you could take a stab at this yourself?
Hi @piskvorky - I'm afraid I probably won't be able to tackle this, mostly because I had a look at the guts of smart_open
and I think I'm not up to the task at this point, but also because this is coming up in my "day job" and the deadlines are pretty tight, so I'm not able to commit the necessary time, at least in the short term.
I had the same issue when trying to open a gz file, when I run unitests the same warning has been shown. I'm using smart-open==4.2.0 and Python 3.8.7
The warning message:
/mnt/c/Users/<user>/workspace/codes/tests/testXX.py:242: ResourceWarning: unclosed <ssl.SSLSocket fd=5,
family=AddressFamily.AF_INET, type=SocketKind.SOCK_STREAM, proto=6, laddr=('yyy.yyy.yyy.yyy', XXXX),
raddr=('zzz.zzz.zzz.zzz', 8080)>
ResourceWarning: Enable tracemalloc to get the object allocation traceback
Anyone been able to fix this?
Cannot reproduce on linux Python 3.10.6 and smart_open 6.1.0.
I had the same issue when trying to open a file on the s3, whan I run unitests the same warning has been shown. My test script:
import smart_open
import unittest
class RunTest(unittest.TestCase):
def test_load_pickle_s3(self):
path = "s3://my_test_direcotry/test.pkl"
with smart_open.open(path, "wb") as fh:
print("open file")
def test_load_pickle_local(self):
path = "test.pkl"
with smart_open.open(path, "wb") as fh:
print("open file")
Package version: smart_open[s3] 6.3.0 Python 3.9.16
Are you able to work out the cause?
Not at the moment