MONAI Memory leak/high memory usage in ITKReader with mha files

Describe the bug ITKReader with mha files has significantly higher memory usage than NiBabel image reader with nifti files. So much so that we cannot train with mha files using monai. Training with ITKReader, memory peaks over 60GiB which is the limit we set for this training. With NiBabelReader, memory stays well under 30GiB.

To Reproduce

We have a decently sized dataset, ~3000 images of about 500x500x600. To check memory usage in isolation I wrote the following script:

import psutil
import os
from pathlib import Path

import matplotlib.pyplot as plt
import numpy as np
from tqdm import tqdm
from monai.data.image_reader import ITKReader, NibabelReader
from monai.transforms import LoadImage
from monai.data import (
    Dataset,
    DataLoader
)


mha_images_path = Path("/mnt/ssd2/segmentation/image/")
mha_image_paths = list(mha_images_path.glob("*.mha"))
nifi_images_path = Path("/mnt/ssd2/segmentation/image_nifti/")
nifti_image_paths = list(nifi_images_path.glob("*.nii.gz"))

def profile_dataloader_memory_psutil(image_paths, reader, epochs=1, n=None, num_workers=0):
    if n is None:
        n = len(image_paths)
    # Get current process and its children
    process = psutil.Process(os.getpid())
    
    # Record baseline memory
    baseline_memory = process.memory_info().rss
    
    dataset = Dataset(image_paths, transform=[LoadImage(reader=reader)])
    dataloader = DataLoader(dataset, batch_size=1, num_workers=num_workers)
    
    max_memory = baseline_memory
    memory_profile = []
    
    for _ in range(epochs):
        for i, batch in enumerate(dataloader):
            # Get memory usage of main process and all children
            current_memory = process.memory_info().rss
            for child in process.children(recursive=True):
                try:
                    current_memory += child.memory_info().rss
                except psutil.NoSuchProcess:
                    pass  # Child process may have terminated
            
            memory_profile.append(current_memory)
            max_memory = max(max_memory, current_memory)
            print(f"Processing batch {i + 1}, current total memory: {current_memory / 1024 / 1024:.2f} MB")
            
            if i >= n - 1:
                break
    
    peak_usage = max_memory - baseline_memory
    return peak_usage, memory_profile

def profile_mha_dataloader_memory_psutil(epochs=1, n=None, num_workers=0):
    print(f"Profiling MHA DataLoader memory usage with {num_workers} workers...")
    peak, memory_profile = profile_dataloader_memory_psutil(mha_image_paths, ITKReader(reverse_indexing=True), epochs=epochs, n=n, num_workers=num_workers)
    print(f"Peak memory usage for MHA: {peak / 1024 / 1024:.2f} MB")
    plt.plot(list(map(lambda x: x / 1024**2, memory_profile)))
    plt.title("MHA DataLoader Memory Usage")
    plt.xlabel("Batch Index")
    plt.ylabel("Memory Usage (MiB)")
    plt.savefig(f"mha_dataloader_memory_usage_epochs={epochs}_num_workers={num_workers}.png")
    plt.close()


def profile_nifti_dataloader_memory_psutil(epochs=1, n=None, num_workers=0):
    print(f"Profiling NIfTI DataLoader memory usage with {num_workers} workers...")
    peak, memory_profile = profile_dataloader_memory_psutil(nifti_image_paths, NibabelReader(), epochs=epochs, n=n, num_workers=num_workers)
    print(f"Peak memory usage for NIfTI: {peak / 1024 / 1024:.2f} MB")
    plt.plot(list(map(lambda x: x / 1024**2, memory_profile)))
    plt.title("NIfTI DataLoader Memory Usage")
    plt.xlabel("Batch Index")
    plt.ylabel("Memory Usage (MiB)")
    plt.savefig(f"nifti_dataloader_memory_usage_epochs={epochs}_num_workers={num_workers}.png")
    plt.close()

profile_mha_dataloader_memory_psutil(epochs=300, num_workers=8)
profile_nifti_dataloader_memory_psutil(epochs=300, num_workers=8)

See the screenshots in the screenshots section for results. Running this script gives some idea of the difference in memory but it is much more pronounced in training. You can also see the memory usage for the ITKReader steadily increase over time compared to the NiBabelReader memory usage graph.

I also wrote a custom callback to track memory usage during training. I will include screenshots of those tensorboard graphs as well.

Expected behavior

We expect the memory usage of ITKReader to be at least somewhat more similar to NiBabelReader, but mostly we'd expect it to be more stable, and not go over the memory limit set for our container.

Screenshots

Below are graphs plotting the process memory usage against the number of 'epochs' for the two image readers/formats. Note that the Y-axis is scaled differently. Note that the ITKReader goes up to 9 GiB here while NiBabel goes up to 6 GiB.

ITKReader with mha files:

NiBabelReader with nifti files:

Tensorboard graphs comparing memory usage of ITKReader with mha vs NiBabelReader with nifti:

Environment

I am using Docker image projectmonai/monai:latest

Debug output:

================================
Printing MONAI config...
================================
MONAI version: 1.4.0rc12+5.g4a4c2512
Numpy version: 1.24.4
Pytorch version: 2.5.0a0+872d972e41.nv24.08
MONAI flags: HAS_EXT = True, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: 4a4c25129f533e275764ee41a0303d5c5dec5b63
MONAI __file__: /opt/monai/monai/__init__.py

Optional dependencies:
Pytorch Ignite version: 0.4.11
ITK version: 5.4.0
Nibabel version: 5.3.1
scikit-image version: 0.24.0
scipy version: 1.14.0
Pillow version: 10.4.0
Tensorboard version: 2.16.2
gdown version: 5.2.0
TorchVision version: 0.20.0a0
tqdm version: 4.66.5
lmdb version: 1.5.1
psutil version: 6.0.0
pandas version: 2.2.2
einops version: 0.8.0
transformers version: 4.40.2
mlflow version: 2.17.0
pynrrd version: 1.0.0
clearml version: 1.16.5rc2

Jul 16 '25 08:07 Jorissss

ITK IO mechanism is quite general, and it allocates the entire image buffer. Each format specific reader typically allocates the entire image too, and copies from one memory area to the other. That would explain higher peak memory usage for reading. But there should be no memory leaks.

Jul 21 '25 18:07 dzenanz

@dzenanz thank you for the reply! I think you're right that indeed the memory leak is not in the ITK I/O, as we also see increasing memory usage over time with the nifti reader:

Could you elaborate in a little more detail why ITK's peak memory usage is significantly higher than the nifti reader? Just curious.

Aug 04 '25 07:08 Jorissss

Each format specific reader typically allocates the entire image too, and copies from one memory area to the other.

During the read, there are two copies of image in the memory, one with the same layout as on disk, and the second one with possibly different layout, e.g. big endian vs little endian, double vs float, short vs float, RGB vs scalar etc. This is probably the reason for higher memory use by ITK.

Aug 04 '25 13:08 dzenanz

@dzenanz I see, thanks!

Aug 06 '25 07:08 Jorissss

Have you discovered the exact source of memory leak? Should this issue be closed?

Nov 07 '25 21:11 dzenanz

@dzenanz Hi, no I was not able to find the source unfortunately. Currently we are working around it by using nibabel/nifti instead of ITK/mha, or doing explicit garbage collection (which does affect training speed).

Nov 17 '25 10:11 Jorissss