PersistentDataset not usable anymore (v1.5.1) ?
Since the modifications to the PersistentDataset made in version 1.5.1, the persistent dataset does not allow to save and load metadata. This mean we cannot invert back the initial transforms or even know the original filename (and probably a lot more).
To Reproduce Steps to reproduce the behavior:
- Install MONAI version 1.5.1
- Load and transform an image through the PersistentDataset
- Re-load the image from cache
- Try to invert and save the image
Expected behavior The original image and outputted image should be the same
Code used
data = ["./image.nrrd"]
transforms = Compose([LoadImage(ensure_channel_first=True), Rotate90()])
ds=PersistentDataset(data, transform=transforms, cache_dir="./cache")
reloaded_data = ds[0]
reloaded_data = ds[0]
inverter = Invert(transforms)
inverted = inverter(reloaded_data)
saver = SaveImage("./out", output_ext=".nrrd")
saver(inverted)
Let me know if there is a way to bypass this behavior, thank you.
#8566 fixed the vulnerabilities associated with the pickle module and torch.load(..., weights_only=False) by defensively serialising PyTorch tensors only. It looks like MetaTensors didn't make the cut.
https://github.com/Project-MONAI/MONAI/blob/9c6d819f97e37f36c72f3bdfad676b455bd2fa0d/monai/data/dataset.py#L211-L213
Objects are saved to cache after being converted to PyTorch tensors with the default argument of track_meta=False
https://github.com/Project-MONAI/MONAI/blob/9c6d819f97e37f36c72f3bdfad676b455bd2fa0d/monai/data/dataset.py#L401
and loaded in weights-only mode
https://github.com/Project-MONAI/MONAI/blob/9c6d819f97e37f36c72f3bdfad676b455bd2fa0d/monai/data/dataset.py#L380
I've tried converting with track_meta=True but that sets this off
https://github.com/Project-MONAI/MONAI/blob/9c6d819f97e37f36c72f3bdfad676b455bd2fa0d/monai/data/dataset.py#L384-L387
and recomputes the tensor every time, effectively bypassing the main benefit of caching.
While I appreciate the severity of the security issues addressed by #8566 is high, I feel like the breaking nature of this change wasn't properly advertised. The pull request marked itself as a non-breaking change, its pre-merge check even identified this problem, and we get this notice at the end of a class docstring:
https://github.com/Project-MONAI/MONAI/blob/9c6d819f97e37f36c72f3bdfad676b455bd2fa0d/monai/data/dataset.py#L214
This warranted a deprecation path in 1.5.x, or bumping the version to 1.6.0 at the very least per semantic versioning conventions. This is a breaking change that fundamentally alters PersistentDataset behaviour for anyone using transforms that produce MetaTensors i.e. most non-trivial preprocessing pipelines.
Environment:
- MacOS 15.6.1 (24G90)
sys.version == '3.12.11 (main, Jul 11 2025, 22:26:01) [Clang 20.1.4 ]'uv -v:uv 0.7.21 (Homebrew 2025-07-14)uv add monai[all]==1.5.1
After some more digging it seems like MetaTensors are supported, but NumPy arrays aren't.
@SebGoll I've recreated your specific example using this sample .nrrd file:
from monai.data.dataset import PersistentDataset
from monai.transforms import LoadImage, Rotate90, Compose, Invert, SaveImage
from pathlib import Path
img = Path("./BallBinary30x30x30.nrrd")
transforms = Compose(
[
LoadImage(ensure_channel_first=True),
Rotate90(),
]
)
ds = PersistentDataset([img], cache_dir=".", transform=transforms)
_ = ds[0]
rotated_from_cache = ds[0]
inverter = Invert(transforms)
inverted = inverter(rotated_from_cache)
saver = SaveImage("./out", output_ext=img.suffix)
saver(inverted)
Running this example as is yields:
Corrupt cache file detected: 384d57fe2ef3e1e8844bd384282a9808.pt. Deleting and recomputing.
2025-10-22 19:43:47,591 INFO image_writer.py:197 - writing: out/BallBinary30x30x30/BallBinary30x30x30_trans.nrrd
I was able to get the cache file read by:
- modifying the MONAI source to add
track_meta=Trueat this line https://github.com/Project-MONAI/MONAI/blob/9c6d819f97e37f36c72f3bdfad676b455bd2fa0d/monai/data/dataset.py#L401 - registering these NumPy types and
TraceKeysas safe globals before usingPersistentDataset:
import monai.utils
import numpy as np
import torch
torch.serialization.add_safe_globals([
np._core.multiarray._reconstruct,
np.ndarray,
np.dtype,
np.dtypes.Int64DType,
np.dtypes.Float64DType,
monai.utils.enums.TraceKeys,
])
I think the broader issue remains: this is a breaking change to core functionality that was introduced in a patch release without deprecation warnings or migration guidance. Users upgrading from 1.5.0 to 1.5.1 will find their caches invalidated with no clear path forward beyond manually clearing and rebuilding them.
Note: I also observed differences between the input and output .nrrd files when comparing them directly (54KB vs 108KB), but I think that's due to the input file using shorts, but LoadImage converting to float32 and Rotate90 introducing minor floating-point precision effects in the spatial metadata.
Hello @SebGoll and @iyassou,
I had noticed similar issues with PersistentDataset no longer supporting MetaTensor objects, so I have submitted this PR with my solution. PersistentDataset now accepts track_meta and weights_only directly, allowing for MetaTensors to be cached and read with track_meta=True and weights_only=False. The default arguments preserve the current behavior of the library, so it does not resolve the backwards compatibility issue you had mentioned.