zarr-python icon indicating copy to clipboard operation
zarr-python copied to clipboard

Fix a TOCTOU race in `open_array(mode="a")`

Open alanhdu opened this issue 3 years ago • 4 comments
trafficstars

If you have multiple writers trying todo zarr.open_array(..., "a"), the old code code sometimes raise a ContainsArrayError, even thogh opening an already-existing array should be fine.

The underlying problem is a time-of-check/time-of-use race condition, where the array can be created (by another writer) between the contains_array check and the init_array call.

To fix this, we simply swallow the error from the init_array call.

I'm not sure how best to test this (since the bug is inherently non-deterministic based on scheduling), but I'm happy to take suggestoins.

alanhdu avatar Feb 10 '22 23:02 alanhdu

Codecov Report

Merging #963 (4cfc5bb) into main (5cdb202) will decrease coverage by 0.02%. The diff coverage is 91.42%.

:exclamation: Current head 4cfc5bb differs from pull request most recent head 05396aa. Consider uploading reports for the commit 05396aa to get more accurate results

@@            Coverage Diff             @@
##             main     #963      +/-   ##
==========================================
- Coverage   99.94%   99.92%   -0.03%     
==========================================
  Files          34       34              
  Lines       13847    13744     -103     
==========================================
- Hits        13840    13734     -106     
- Misses          7       10       +3     
Impacted Files Coverage Δ
zarr/creation.py 99.40% <75.00%> (-0.60%) :arrow_down:
zarr/tests/test_creation.py 99.62% <93.54%> (-0.38%) :arrow_down:
zarr/hierarchy.py 99.78% <0.00%> (-0.01%) :arrow_down:
zarr/core.py 100.00% <0.00%> (ø)
zarr/storage.py 100.00% <0.00%> (ø)
zarr/indexing.py 100.00% <0.00%> (ø)
zarr/convenience.py 100.00% <0.00%> (ø)
zarr/_storage/store.py 100.00% <0.00%> (ø)
zarr/tests/test_core.py 100.00% <0.00%> (ø)
zarr/tests/test_storage.py 100.00% <0.00%> (ø)
... and 4 more

codecov[bot] avatar Feb 11 '22 08:02 codecov[bot]

Thanks, @alanhdu. ~Do you have any ideas for a test that would trigger the race condition so we can keep the code coverage up?~ (Missed your comment) To get code coverage passing, we could potentially add a # pragma: no cover to the line. An idea would be to at least try the same operation from a number of threading.Thread instances. Using a Event e.g. might help get all threads to the same state together.

(Edit) For example:

import threading
import logging

logging.basicConfig(level=logging.DEBUG,
                    format='(%(threadName)-9s) %(message)s',)

event = threading.Event()

class Thread(threading.Thread):

    def __init__(self, event):
        threading.Thread.__init__(self)
        self.event = event

    def run(self):
        logging.debug("do some setup")
        self.event.wait()
        logging.debug("now attempt the race condition")


threads = [Thread(event) for i in range(10)]
for t in threads:
    t.start()
event.set()

However there are also a number of tests under zarr/tests/test_sync.py might help you.

joshmoore avatar Feb 11 '22 10:02 joshmoore

@alanhdu : do you want to give this a try?

joshmoore avatar Mar 07 '22 18:03 joshmoore

@alanhdu: I've added my test proposal but I've not yet been able to reproduce a failure.

p.s. @Alt-Shivam, thanks!

flake8...................................................................Failed
- hook id: flake8
- exit code: 1

zarr/tests/test_creation.py:733:5: F841 local variable 'expected_zarr_version' is assigned to but never used

Check Yaml...............................................................Passed

joshmoore avatar May 03 '22 09:05 joshmoore