MONAI icon indicating copy to clipboard operation
MONAI copied to clipboard

Error Running MONAI Spleen CT Segmentation Bundle for version>1.2.0 When using MONAI FL Module

Open Zilinghan opened this issue 8 months ago • 0 comments

Describe the bug I am trying to use the MONAI's Spleen CT Segmentation Bundle with the MONAI FL module, and errors appear for monai>1.2.0. (It worked for 1.2.0, but error occurs for 1.3.0 and 1.4.0.)

To Reproduce Steps to reproduce the behavior:

  1. I downloaded the bundle using the following commands:
JOB_NAME=job
python3 -m monai.bundle download --name "spleen_ct_segmentation" --version "0.4.6" --bundle_dir ./${JOB_NAME}/app/config
  1. I download the data using the following script:
# download_spleen_dataset.py
import argparse

from monai.apps.utils import download_and_extract


def download_spleen_dataset(filepath, output_dir):
    url = "https://msd-for-monai.s3-us-west-2.amazonaws.com/Task09_Spleen.tar"
    download_and_extract(url=url, filepath=filepath, output_dir=output_dir)


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "--filepath",
        "-f",
        type=str,
        help="the file path of the downloaded compressed file.",
        default="./data/Task09_Spleen.tar",
    )
    parser.add_argument(
        "--output_dir", "-o", type=str, help="target directory to save extracted files.", default="./data"
    )
    args = parser.parse_args()
    download_spleen_dataset(args.filepath, args.output_dir)

and then runs command

JOB_NAME=job
python download_spleen_dataset.py
sed -i "s|/workspace/data/Task09_Spleen|${PWD}/data/Task09_Spleen|g" ${JOB_NAME}/app/config/spleen_ct_segmentation/configs/train.json
  1. I installed monai via:
pip install monai[all]==1.4.0 # or pip install monai[all]==1.3.0 # or pip install monai[all]==1.2.0
  1. The testing script I run is: (python test.py)
# test.py
from monai.fl.client.monai_algo import MonaiAlgo
from monai.fl.utils.constants import ExtraItems

monai_algo = MonaiAlgo(
    bundle_root='./job/app/config/spleen_ct_segmentation',
    send_weight_diff=False,
)

monai_algo.initialize(
    extra={
        ExtraItems.CLIENT_NAME: "Client",
    }
)

model = monai_algo.get_weights()
metric = monai_algo.evaluate(model)
print(metric.metrics)
monai_algo.train(model)
new_model = monai_algo.get_weights()
metric = monai_algo.evaluate(new_model)
print(metric.metrics)
  1. I got the following error for monai 1.3.0 and 1.4.0, (1.2.0 works fine)
2025-02-06 14:55:23,959 - INFO - Setting logging properties based on config: job/app/config/spleen_ct_segmentation/configs/logging.conf.
2025-02-06 14:55:24,020 - INFO - Initialized Client.
2025-02-06 14:55:24,025 - INFO - Returning current weights.
2025-02-06 14:55:24,025 - INFO - Load Client weights...
2025-02-06 14:55:24,025 - INFO - Converted 148 global variables to match 148 local variables.
2025-02-06 14:55:24,026 - INFO - 'dst' model updated: 148 of 148 variables.
2025-02-06 14:55:24,031 - INFO - Start Client evaluating...
2025-02-06 14:55:24,031 - ignite.engine.engine.SupervisedEvaluator - INFO - Engine run resuming from iteration 0, epoch 0 until 1 epochs
2025-02-06 14:55:26,275 - ignite.engine.engine.SupervisedEvaluator - ERROR - Current run is terminating due to exception: 'image_meta_dict'
2025-02-06 14:55:26,275 - ERROR - Exception: 'image_meta_dict'
Traceback (most recent call last):
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/ignite/engine/engine.py", line 1069, in _run_once_on_dataset_as_gen
    self._fire_event(Events.ITERATION_COMPLETED)
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/ignite/engine/engine.py", line 425, in _fire_event
    func(*first, *(event_args + others), **kwargs)
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/monai/handlers/metrics_saver.py", line 124, in _get_filenames
    meta_data = self.batch_transform(engine.state.batch)
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/monai/handlers/utils.py", line 199, in _wrapper
    ret = [data[0][k] if first else [i[k] for i in data] for k in _keys]
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/monai/handlers/utils.py", line 199, in <listcomp>
    ret = [data[0][k] if first else [i[k] for i in data] for k in _keys]
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/monai/handlers/utils.py", line 199, in <listcomp>
    ret = [data[0][k] if first else [i[k] for i in data] for k in _keys]
KeyError: 'image_meta_dict'
2025-02-06 14:55:27,522 - ignite.engine.engine.SupervisedEvaluator - ERROR - Engine run is terminating due to exception: 'image_meta_dict'
2025-02-06 14:55:27,522 - ERROR - Exception: 'image_meta_dict'
Traceback (most recent call last):
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/ignite/engine/engine.py", line 959, in _internal_run_as_gen
    epoch_time_taken += yield from self._run_once_on_dataset_as_gen()
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/ignite/engine/engine.py", line 1087, in _run_once_on_dataset_as_gen
    self._handle_exception(e)
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/ignite/engine/engine.py", line 636, in _handle_exception
    self._fire_event(Events.EXCEPTION_RAISED, e)
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/ignite/engine/engine.py", line 425, in _fire_event
    func(*first, *(event_args + others), **kwargs)
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/monai/handlers/stats_handler.py", line 202, in exception_raised
    raise e
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/ignite/engine/engine.py", line 1069, in _run_once_on_dataset_as_gen
    self._fire_event(Events.ITERATION_COMPLETED)
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/ignite/engine/engine.py", line 425, in _fire_event
    func(*first, *(event_args + others), **kwargs)
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/monai/handlers/metrics_saver.py", line 124, in _get_filenames
    meta_data = self.batch_transform(engine.state.batch)
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/monai/handlers/utils.py", line 199, in _wrapper
    ret = [data[0][k] if first else [i[k] for i in data] for k in _keys]
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/monai/handlers/utils.py", line 199, in <listcomp>
    ret = [data[0][k] if first else [i[k] for i in data] for k in _keys]
  File "/eagle/tpc/zilinghan/conda_envs/appfl/lib/python3.10/site-packages/monai/handlers/utils.py", line 199, in <listcomp>
    ret = [data[0][k] if first else [i[k] for i in data] for k in _keys]
KeyError: 'image_meta_dict'
...

Expected behavior

python test.py runs through without any errors.

Screenshots N/A

Environment

Ensuring you use the relevant python executable, please paste the output of:

================================
Printing MONAI config...
================================
MONAI version: 1.4.0
Numpy version: 1.26.4
Pytorch version: 2.3.1+cu121
MONAI flags: HAS_EXT = False, USE_COMPILED = False, USE_META_DICT = False
MONAI rev id: 46a5272196a6c2590ca2589029eed8e4d56ff008
MONAI __file__: /eagle/tpc/<username>/conda_envs/appfl/lib/python3.10/site-packages/monai/__init__.py

Optional dependencies:
Pytorch Ignite version: 0.4.11
ITK version: 5.4.0
Nibabel version: 5.3.2
scikit-image version: 0.24.0
scipy version: 1.14.1
Pillow version: 10.3.0
Tensorboard version: 2.18.0
gdown version: 5.2.0
TorchVision version: 0.18.1+cu121
tqdm version: 4.67.1
lmdb version: 1.6.2
psutil version: 5.9.8
pandas version: 2.2.3
einops version: 0.8.0
transformers version: 4.40.2
mlflow version: 2.19.0
pynrrd version: 1.1.1
clearml version: 1.17.0

For details about installing the optional dependencies, please visit:
    https://docs.monai.io/en/latest/installation.html#installing-the-recommended-dependencies


================================
Printing system config...
================================
System: Linux
Linux version: SUSE Linux Enterprise Server 15 SP5
Platform: Linux-5.14.21-150500.55.49-default-x86_64-with-glibc2.31
Processor: x86_64
Machine: x86_64
Python version: 3.10.14
Process name: pt_main_thread
Command: ['python', '-c', 'import monai; monai.config.print_debug_info()']
Open files: []
Num physical CPUs: 32
Num logical CPUs: 64
Num usable CPUs: 64
CPU usage (%): [1.5, 1.2, 1.1, 1.1, 0.9, 0.8, 0.8, 0.8, 1.3, 1.1, 0.8, 0.9, 5.0, 3.2, 1.0, 1.1, 1.0, 0.9, 0.9, 1.1, 0.9, 1.0, 0.9, 0.9, 0.9, 1.0, 0.9, 1.1, 6.2, 0.9, 0.9, 1.0, 1.1, 1.2, 1.3, 35.3, 1.3, 1.0, 1.0, 0.9, 0.9, 0.9, 1.0, 1.0, 1.0, 1.0, 1.0, 1.4, 1.0, 0.9, 0.9, 1.0, 0.8, 1.1, 1.1, 1.1, 0.9, 0.8, 0.9, 0.9, 0.9, 0.9, 0.9, 0.8]
CPU freq. (MHz): 2788
Load avg. in last 1, 5, 15 mins (%): [0.7, 1.2, 1.0]
Disk usage (%): 0.8
Avg. sensor temp. (Celsius): UNKNOWN for given OS
Total physical memory (GB): 503.2
Available memory (GB): 490.8
Used memory (GB): 4.7

================================
Printing GPU config...
================================
Num GPUs: 4
Has CUDA: True
CUDA version: 12.1
cuDNN enabled: True
NVIDIA_TF32_OVERRIDE: None
TORCH_ALLOW_TF32_CUBLAS_OVERRIDE: None
cuDNN version: 8902
Current device: 0
Library compiled for CUDA architectures: ['sm_50', 'sm_60', 'sm_70', 'sm_75', 'sm_80', 'sm_86', 'sm_90']
GPU 0 Name: NVIDIA A100-SXM4-40GB
GPU 0 Is integrated: False
GPU 0 Is multi GPU board: False
GPU 0 Multi processor count: 108
GPU 0 Total memory (GB): 39.4
GPU 0 CUDA capability (maj.min): 8.0
GPU 1 Name: NVIDIA A100-SXM4-40GB
GPU 1 Is integrated: False
GPU 1 Is multi GPU board: False
GPU 1 Multi processor count: 108
GPU 1 Total memory (GB): 39.4
GPU 1 CUDA capability (maj.min): 8.0
GPU 2 Name: NVIDIA A100-SXM4-40GB
GPU 2 Is integrated: False
GPU 2 Is multi GPU board: False
GPU 2 Multi processor count: 108
GPU 2 Total memory (GB): 39.4
GPU 2 CUDA capability (maj.min): 8.0
GPU 3 Name: NVIDIA A100-SXM4-40GB
GPU 3 Is integrated: False
GPU 3 Is multi GPU board: False
GPU 3 Multi processor count: 108
GPU 3 Total memory (GB): 39.4
GPU 3 CUDA capability (maj.min): 8.0

Additional context Add any other context about the problem here.

Zilinghan avatar Feb 06 '25 15:02 Zilinghan