python-diskcache
python-diskcache copied to clipboard
Cache access fails after forking if multiple `Cache` instances are backed by the same database
Running:
import os
import diskcache
a = diskcache.Cache(directory='/tmp/cache')
b = diskcache.Cache(directory='/tmp/cache')
os.fork()
a.get('key')
on a MacOS machine, fails with:
Traceback (most recent call last):
File "/Users/distiller/project/fork.py", line 9, in <module>
a.get('key')
File "/Users/distiller/project/env/lib/python3.12/site-packages/diskcache/core.py", line 1165, in get
rows = self._sql(select, (db_key, raw, time.time())).fetchall()
^^^^^^^^^
File "/Users/distiller/project/env/lib/python3.12/site-packages/diskcache/core.py", line 648, in _sql
return self._con.execute
^^^^^^^^^
File "/Users/distiller/project/env/lib/python3.12/site-packages/diskcache/core.py", line 623, in _con
con = self._local.con = sqlite3.connect(
^^^^^^^^^^^^^^^^
sqlite3.OperationalError: disk I/O error
(tested on CircleCI M1 medium instance)
AFAICT, all of the following conditions have to be met:
- two (or more)
Cacheinstances that use the same directory - fork before
Cache.get() - MacOS
If any of the above is removed, the snippet works are expected.
SQLite threading mode (sqlite3.threadsafety) is set to multi-thread ("Threads may share the module, but not connections"), so I don't think that's causing this because diskcache reconnects on forking already.
$ python
Python 3.12.4 (main, Jul 18 2024, 14:14:06) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> sqlite3.threadsafety
1
Possibly related to https://github.com/grantjenks/python-diskcache/issues/266.
I tested your code on Ubuntu 22.04 Python 3.12 x86 and it worked fine. This is (maybe) related to how fork works underneath in Python, though I used the same one:
import multiprocessing
multiprocessing.set_start_method("fork", force=True)
print(multiprocessing.get_start_method())
import os
import diskcache
a = diskcache.Cache(directory="/tmp/cache")
b = diskcache.Cache(directory="/tmp/cache")
os.fork()
a.get("key")
@ddorian, exactly, this works perfectly on Linux (as everything does, right?). Maybe I wasn't clear enough above, but MacOS is a necessary condition for reproduction.