netcdf4-python icon indicating copy to clipboard operation
netcdf4-python copied to clipboard

Dataset() with mode="w" truncates file despite PermissionError

Open GiusCappelli opened this issue 4 months ago • 15 comments

Summary

Opening a NetCDF file with mode="w" that is locked by another process:

  • Raises PermissionError (expected)
  • Simultaneously truncates the target file to 0 bytes (data corruption!)

I.e., data is permanently lost when trying to opening a locked file, even though the operation "failed."

Expected Behavior

nc.Dataset(filename, mode="w") should either:

  • Succeed completely and overwrite the file, OR
  • Fail cleanly without modifying the existing file

Reproduction Steps

The bug can be reproduced when working on the same dataset from multiple sessions. E.g. with two Jupyter notebooks. (A and B in the test below)

# Setup: Jupyter notebook A
import netCDF4 as nc
import numpy as np
import os

# Create initial test file
ds = nc.Dataset("test.nc", "w")
ds.createDimension("x", 10)
var = ds.createVariable("data", "f8", ("x",))
var[:] = np.random.rand(10)
ds.close()

print(f"Original file size: {os.path.getsize('test.nc')} bytes")

Output: Original file size: 6224 bytes

Reproduce the bug:


# Step 1: Jupyter Notebook B: Open file in read mode (simulates file lock from another process)
ds_read = nc.Dataset("test.nc", "r")  # Keep this open

# Step 2: BACK TO A: Try to overwrite from different process/session
try:
    ds_write = nc.Dataset("test.nc", "w")  # This should fail cleanly
except PermissionError as e:
    print(f"Got expected error: {e}")
    print(f"File size after error: {os.path.getsize('test.nc')} bytes")  # Shows 0!

The output shows indeed 0 bytes. The file has been truncated!

Got expected error: [Errno 13] Permission denied: 'test_ds.nc'
File size after error: 0 bytes
# Step 3: (Optional, the bug already happened) in B, close the read handle
ds_read.close()

# Step 4: Try to read the original file
try:
    damaged = nc.Dataset("test.nc", "r")
except Exception as e:
    print(f"{e}")

Output:

[Errno -51] NetCDF: Unknown file format

Impact

This bug makes netcdf4-python unsafe for concurrent access scenarios and multi-user environments. In my case, this was a jupyter notebook workflow where files may be accessed from multiple kernels.

I think this is a nasty bug as users lose their work even though the operation reported failure. I understand that locked file cannot be opened, obviously, but this should not erase the locked file (by the way why does this happen?).

Notice also that the message from Errno -51 can be (is) confusing: it suggests problems with the the file format, where people think "what can be wrong with the format? I'm working with netCDF as usual..." (one conflates the format with the file extension).

The actual problem is that there is nothing in the file, but this is caused by unforseen file truncation... adding to pre-existing confusion.

Environment Details

OS: Windows 11 Python: 3.13.2 netcdf4-python version: 1.7.2 NetCDF C library version: 4.9.2 Context: Occurs in Jupyter notebook environments with concurrent access. Not tested assumption this might happen for concurrent accesses in general.

History

This was originally reported in xarray (https://github.com/pydata/xarray/issues/10679) but traced back to netcdf4-python as the root cause.

Suggested Fix

The file should not be modified if one does not have write access. Things that come to mind:

  • Check file permissions before truncation
  • Use atomic write operations (write to temp file, then rename)
  • Fail immediately on permission errors without modifying the target file

(Bonus) Final comment/Question

It is unclear to me why the file is truncated even though it cannot be accessed (we do get a PermissionError!). Why does this happen?

GiusCappelli avatar Sep 05 '25 07:09 GiusCappelli

Using mode='w' will cause the netcdf-c lib to instantly clobber an existing file - you could use mode='x' (or mode='w' with clobber=False) and netcdf will refuse to open a file for writing that already exists. I do see you point that if the PermissionError should be thrown before this happens - but that apparently is not the case in netcdf-c.

jswhit avatar Sep 05 '25 15:09 jswhit

I suppose we could check the file access permissions at the python level before trying to open the file with the C lib to avoid this. Is Jupyter creating a file resource lock in your example? netcdf4-python doesn't.

jswhit avatar Sep 05 '25 15:09 jswhit

answering my own question from above - HDF5 > 1.10 creates a file lock by default. This test script behaves as expected on macos x (problem not reproduced, file is not clobbered)

import netCDF4 as nc
import numpy as np
# create file
ds = nc.Dataset("test.nc", "w")
ds.createDimension("x", 10)
var = ds.createVariable("data", "f8", ("x",))
var[:] = np.random.rand(10)
ds.close()
# re-open file, keep open
ds_read = nc.Dataset("test.nc", "r")
# now try to open it for write access (should
# fail with PermissionError without clobbering existing file)
ds_write = nc.Dataset("test.nc", "w")

@GiusCappelli what platform are you on? EDIT: NVM, I see you are on Windows. Does this test script clobber the file on your system (or does it only happen with concurrent access in Jupyter)?

jswhit avatar Sep 05 '25 16:09 jswhit

If this is a netcdf-c problem, then we should fix it there. Do you know where the truncation is occurring ?

DennisHeimbigner avatar Sep 05 '25 22:09 DennisHeimbigner

@DennisHeimbigner it would have to be at the netcdf-c level, since the python interface is just calling nc_create/nc_open. I can't reproduce it though, so it may be something specific to Windows and/or the Jupyter notebook environment.

jswhit avatar Sep 06 '25 01:09 jswhit

@jswhit sorry for the late response. I use the Jupyter in VSCode, so the file is clobbered directly in my file directory when I read/write from it. Let me know if there are specific tests I should run for you to better understand what is happening.

Possibly unrelated, I should probably link this discussion to the xarray folks to sugges the use of mode='w' clobber=False in their backend implementation of netcdf.

Edit: typos

GiusCappelli avatar Sep 08 '25 12:09 GiusCappelli

#jswhit sorry for the late response. I use the Jupyter in VSCode, so the file is clobbered directly in my file directory when I read/write from it. Let me know if there are specific tests I should run for you to better understand what is happening.

but does it only happen when you have concurrent sessions (the file open for reading in one, and writing in the other?). Does just running the script in my earlier post in a single Jupyter session clobber the file?

jswhit avatar Sep 08 '25 21:09 jswhit

The table on this page seems to indicate that HDF5 file locking may not work on Windows unless you have a recent version of the HDF5 lib. But 1.14.2 (which the wheels ship with) should support file locking.

jswhit avatar Sep 09 '25 16:09 jswhit

I have observed this bug when running concurrent sessions, running read/writes in the same script throws a permission error but does not clobber the file.

Regarding HDF5, I have v1.14.6 in my current conda environment.

GiusCappelli avatar Sep 10 '25 07:09 GiusCappelli

OK thanks, so it is specific to the concurrent Jupyter environment. @DennisHeimbigner it appears that in the concurrent environment (using separate processes) the HDF5 file lock is triggering an error when the file is opened for write access, but the file is clobbered anyway. Can you think of any reason why this would happen?

jswhit avatar Sep 10 '25 14:09 jswhit

I tried this on macos x and can confirm the file gets clobbered when you try to open it in a concurrent Jupyter session (when it is open for read access in a different session). However, I don't get the PermissionError - the file is silently clobbered. So it appears the file locking does not work in a multiprocess environment.

jswhit avatar Sep 10 '25 15:09 jswhit

However, the file locking does work if I have a file open for writing in one Jupyter session and then try to open the same one for writing or reading in another concurrent session. From a closer read of the HDF5 file locking docs, it appears this is how it's supposed to work. It doesn't appear to cover the case when the file is first opened for reading on one process, and is then opened for writing on another. @DennisHeimbigner is this your understanding too? Here's the relevant text from the docs:

According to the file format document and H5Fpkg.h:

Bit 0 is set if the file is open for writing (H5F_SUPER_WRITE_ACCESS)
Bit 2 is set if the file is open for SWMR writing (H5F_SUPER_SWMR_WRITE_ACCESS)
We check these superblock flags on file open and error out if they are unsuitable.

If the file is already opened for non-SWMR writing, no other process can open it.
If the file is open for SWMR writing, only SWMR readers can open the file.
If you try to open a file for reading with H5F_ACC_SWMR_READ set and the file does not have the SWMR writer bits set in the superblock, the open call will fail.

Note - netcdf-c does not current have support for SWMR (single write-multiple read) mode for serial processes.

jswhit avatar Sep 15 '25 19:09 jswhit

Yes, that sounds correct.

DennisHeimbigner avatar Sep 15 '25 19:09 DennisHeimbigner

@DennisHeimbigner I spoke too soon - it looks like even though the PermissionError is raised, the file is actually clobbered when you open it for writing in the second process.

jswhit avatar Sep 15 '25 19:09 jswhit

I don't know what the solution to this is, but I'm 99.9% sure it's not in the python interface.

jswhit avatar Sep 20 '25 15:09 jswhit