python-diskcache icon indicating copy to clipboard operation
python-diskcache copied to clipboard

Cache access fails after forking if multiple `Cache` instances are backed by the same database

Open randomir opened this issue 1 year ago • 2 comments

Running:

import os
import diskcache

a = diskcache.Cache(directory='/tmp/cache')
b = diskcache.Cache(directory='/tmp/cache')

os.fork()

a.get('key')

on a MacOS machine, fails with:

Traceback (most recent call last):
  File "/Users/distiller/project/fork.py", line 9, in <module>
    a.get('key')
  File "/Users/distiller/project/env/lib/python3.12/site-packages/diskcache/core.py", line 1165, in get
    rows = self._sql(select, (db_key, raw, time.time())).fetchall()
           ^^^^^^^^^
  File "/Users/distiller/project/env/lib/python3.12/site-packages/diskcache/core.py", line 648, in _sql
    return self._con.execute
           ^^^^^^^^^
  File "/Users/distiller/project/env/lib/python3.12/site-packages/diskcache/core.py", line 623, in _con
    con = self._local.con = sqlite3.connect(
                            ^^^^^^^^^^^^^^^^
sqlite3.OperationalError: disk I/O error

(tested on CircleCI M1 medium instance)

AFAICT, all of the following conditions have to be met:

  • two (or more) Cache instances that use the same directory
  • fork before Cache.get()
  • MacOS

If any of the above is removed, the snippet works are expected.

SQLite threading mode (sqlite3.threadsafety) is set to multi-thread ("Threads may share the module, but not connections"), so I don't think that's causing this because diskcache reconnects on forking already.

$ python
Python 3.12.4 (main, Jul 18 2024, 14:14:06) [Clang 14.0.0 (clang-1400.0.29.202)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> import sqlite3
>>> sqlite3.threadsafety
1

Possibly related to https://github.com/grantjenks/python-diskcache/issues/266.

randomir avatar Jul 25 '24 22:07 randomir

I tested your code on Ubuntu 22.04 Python 3.12 x86 and it worked fine. This is (maybe) related to how fork works underneath in Python, though I used the same one:

import multiprocessing

multiprocessing.set_start_method("fork", force=True)

print(multiprocessing.get_start_method())
import os

import diskcache

a = diskcache.Cache(directory="/tmp/cache")
b = diskcache.Cache(directory="/tmp/cache")

os.fork()

a.get("key")


ddorian avatar Aug 09 '24 14:08 ddorian

@ddorian, exactly, this works perfectly on Linux (as everything does, right?). Maybe I wasn't clear enough above, but MacOS is a necessary condition for reproduction.

randomir avatar Aug 09 '24 14:08 randomir