Docker Slim Image
Description
This is an attempt to create a slim Docker image which is smaller than the current one to avoid running out of space during testing. Various fixes have been included to account for test fails within the image. These appear to be all real issues that need to be addressed (eg. ONNX export) or fixes that should be integrated either way.
This excludes PyTorch 2.9 from the requirements for now to avoid legacy issues with ONNX, Torchscript, and other things. MONAI needs to be updated for PyTorch 2.9 support, specifically dropping the use of Torchscript in places as it's becoming obsolete in place of torch.export.
Some tests fail without enough shared memory, the command I'm using to run with is docker run -ti --rm --gpus '"device=0,1"' --shm-size=10gb -v $(pwd)/tests:/opt/monai/tests monai_slim /bin/bash to tests with GPUs 0 and 1.
Types of changes
- [x] Non-breaking change (fix or new feature that would not break existing functionality).
- [ ] Breaking change (fix or new feature that would cause existing functionality to change).
- [ ] New tests added to cover the changes.
- [x] Integration tests passed locally by running
./runtests.sh -f -u --net --coverage. - [ ] Quick tests passed locally by running
./runtests.sh --quick --unittests --disttests. - [ ] In-line docstrings updated.
- [ ] Documentation updated, tested
make htmlcommand in thedocs/folder.
Nine tests in the image currently fail. The first 4 are related to auto3dseg and mention a value "image_stats" being missing from a config file, these tests pass when run in isolation however. The others relate to the GMM module and not being able to compile it since nvcc is missing from image, which is true since the CUDA toolkit is omitted for size reasons.
Output of the errors
======================================================================
ERROR: test_ensemble (tests.integration.test_auto3dseg_ensemble.TestEnsembleBuilder.test_ensemble)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/opt/monai/monai/bundle/config_parser.py", line 158, in __getitem__
look_up_option(k, config, print_all_options=False) if isinstance(config, dict) else config[int(k)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/monai/utils/module.py", line 141, in look_up_option
raise ValueError(f"Unsupported option '{opt_str}', " + supported_msg)
ValueError: Unsupported option 'image_stats',
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/monai/tests/integration/test_auto3dseg_ensemble.py", line 155, in test_ensemble
bundle_generator.generate(self.work_dir, num_fold=1)
File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 660, in generate
gen_algo.export_to_disk(output_folder, name, fold=f_id)
File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 193, in export_to_disk
self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp324gd5iq/workdir/algorithm_templates/dints/scripts/algo.py", line 79, in fill_template_config
File "/opt/monai/monai/bundle/config_parser.py", line 161, in __getitem__
raise KeyError(f"query key: {k}") from e
KeyError: 'query key: image_stats'
======================================================================
ERROR: test_get_history (tests.integration.test_auto3dseg_hpo.TestHPO.test_get_history)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/opt/monai/monai/bundle/config_parser.py", line 158, in __getitem__
look_up_option(k, config, print_all_options=False) if isinstance(config, dict) else config[int(k)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/monai/utils/module.py", line 141, in look_up_option
raise ValueError(f"Unsupported option '{opt_str}', " + supported_msg)
ValueError: Unsupported option 'image_stats',
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/monai/tests/integration/test_auto3dseg_hpo.py", line 129, in setUp
bundle_generator.generate(work_dir, num_fold=1)
File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 660, in generate
gen_algo.export_to_disk(output_folder, name, fold=f_id)
File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 193, in export_to_disk
self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp324gd5iq/workdir/algorithm_templates/dints/scripts/algo.py", line 79, in fill_template_config
File "/opt/monai/monai/bundle/config_parser.py", line 161, in __getitem__
raise KeyError(f"query key: {k}") from e
KeyError: 'query key: image_stats'
======================================================================
ERROR: test_run_algo (tests.integration.test_auto3dseg_hpo.TestHPO.test_run_algo)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/opt/monai/monai/bundle/config_parser.py", line 158, in __getitem__
look_up_option(k, config, print_all_options=False) if isinstance(config, dict) else config[int(k)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/monai/utils/module.py", line 141, in look_up_option
raise ValueError(f"Unsupported option '{opt_str}', " + supported_msg)
ValueError: Unsupported option 'image_stats',
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/monai/tests/integration/test_auto3dseg_hpo.py", line 129, in setUp
bundle_generator.generate(work_dir, num_fold=1)
File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 660, in generate
gen_algo.export_to_disk(output_folder, name, fold=f_id)
File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 193, in export_to_disk
self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp324gd5iq/workdir/algorithm_templates/dints/scripts/algo.py", line 79, in fill_template_config
File "/opt/monai/monai/bundle/config_parser.py", line 161, in __getitem__
raise KeyError(f"query key: {k}") from e
KeyError: 'query key: image_stats'
======================================================================
ERROR: test_run_optuna (tests.integration.test_auto3dseg_hpo.TestHPO.test_run_optuna)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/opt/monai/monai/bundle/config_parser.py", line 158, in __getitem__
look_up_option(k, config, print_all_options=False) if isinstance(config, dict) else config[int(k)]
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/monai/utils/module.py", line 141, in look_up_option
raise ValueError(f"Unsupported option '{opt_str}', " + supported_msg)
ValueError: Unsupported option 'image_stats',
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/monai/tests/integration/test_auto3dseg_hpo.py", line 129, in setUp
bundle_generator.generate(work_dir, num_fold=1)
File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 660, in generate
gen_algo.export_to_disk(output_folder, name, fold=f_id)
File "/opt/monai/monai/apps/auto3dseg/bundle_gen.py", line 193, in export_to_disk
self.fill_records = self.fill_template_config(self.data_stats_files, self.output_path, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/tmp/tmp324gd5iq/workdir/algorithm_templates/dints/scripts/algo.py", line 79, in fill_template_config
File "/opt/monai/monai/bundle/config_parser.py", line 161, in __getitem__
raise KeyError(f"query key: {k}") from e
KeyError: 'query key: image_stats'
======================================================================
ERROR: test_cuda_0_2_batches_1_dimensions_1_channels_2_classes_2_mixtures (tests.networks.layers.test_gmm.GMMTestCase.test_cuda_0_2_batches_1_dimensions_1_channels_2_classes_2_mixtures)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
return func(*(a + p.args), **p.kwargs, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/tests/networks/layers/test_gmm.py", line 287, in test_cuda
gmm = GaussianMixtureModel(features_tensor.size(1), mixture_count, class_count, verbose_build=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/monai/networks/layers/gmm.py", line 44, in __init__
self.compiled_extension = load_module(
^^^^^^^^^^^^
File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
module = load(
^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
return _jit_compile(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
_write_ninja_file_and_build_library(
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_1_2_1_Linux_3_11_2_28_12_8'
======================================================================
ERROR: test_cuda_1_1_batches_1_dimensions_5_channels_2_classes_1_mixtures (tests.networks.layers.test_gmm.GMMTestCase.test_cuda_1_1_batches_1_dimensions_5_channels_2_classes_1_mixtures)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
return func(*(a + p.args), **p.kwargs, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/tests/networks/layers/test_gmm.py", line 287, in test_cuda
gmm = GaussianMixtureModel(features_tensor.size(1), mixture_count, class_count, verbose_build=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/monai/networks/layers/gmm.py", line 44, in __init__
self.compiled_extension = load_module(
^^^^^^^^^^^^
File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
module = load(
^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
return _jit_compile(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
_write_ninja_file_and_build_library(
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_5_2_1_Linux_3_11_2_28_12_8'
======================================================================
ERROR: test_cuda_2_1_batches_2_dimensions_2_channels_4_classes_4_mixtures (tests.networks.layers.test_gmm.GMMTestCase.test_cuda_2_1_batches_2_dimensions_2_channels_4_classes_4_mixtures)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
return func(*(a + p.args), **p.kwargs, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/tests/networks/layers/test_gmm.py", line 287, in test_cuda
gmm = GaussianMixtureModel(features_tensor.size(1), mixture_count, class_count, verbose_build=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/monai/networks/layers/gmm.py", line 44, in __init__
self.compiled_extension = load_module(
^^^^^^^^^^^^
File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
module = load(
^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
return _jit_compile(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
_write_ninja_file_and_build_library(
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_2_4_1_Linux_3_11_2_28_12_8'
======================================================================
ERROR: test_cuda_3_1_batches_3_dimensions_1_channels_2_classes_1_mixtures (tests.networks.layers.test_gmm.GMMTestCase.test_cuda_3_1_batches_3_dimensions_1_channels_2_classes_1_mixtures)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/parameterized/parameterized.py", line 620, in standalone_func
return func(*(a + p.args), **p.kwargs, **kw)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/tests/networks/layers/test_gmm.py", line 287, in test_cuda
gmm = GaussianMixtureModel(features_tensor.size(1), mixture_count, class_count, verbose_build=True)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/opt/monai/monai/networks/layers/gmm.py", line 44, in __init__
self.compiled_extension = load_module(
^^^^^^^^^^^^
File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
module = load(
^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
return _jit_compile(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
_write_ninja_file_and_build_library(
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_1_2_1_Linux_3_11_2_28_12_8_v1'
======================================================================
ERROR: test_load (tests.networks.layers.test_gmm.GMMTestCase.test_load)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2595, in _run_ninja_build
subprocess.run(
File "/usr/lib/python3.11/subprocess.py", line 571, in run
raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['ninja', '-v']' returned non-zero exit status 127.
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
File "/opt/monai/tests/networks/layers/test_gmm.py", line 310, in test_load
load_module("gmm", {"CHANNEL_COUNT": 2, "MIXTURE_COUNT": 2, "MIXTURE_SIZE": 3}, verbose_build=True)
File "/opt/monai/monai/_extensions/loader.py", line 89, in load_module
module = load(
^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 1681, in load
return _jit_compile(
^^^^^^^^^^^^^
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2138, in _jit_compile
_write_ninja_file_and_build_library(
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2290, in _write_ninja_file_and_build_library
_run_ninja_build(
File "/usr/local/lib/python3.11/dist-packages/torch/utils/cpp_extension.py", line 2612, in _run_ninja_build
raise RuntimeError(message) from e
RuntimeError: Error building extension 'gmm_2_2_3_Linux_3_11_2_28_12_8'
It's a simple matter of forcing the GMM module to build when building the image, this also fails if used as a RUN command: python -c 'from monai._extensions import load_module;load_module("gmm", {"CHANNEL_COUNT": 2, "MIXTURE_COUNT": 2, "MIXTURE_SIZE": 3}, verbose_build=True)'
Walkthrough
This PR introduces infrastructure and code refinements across build configuration, CI workflows, and test utilities. Changes include expanding .dockerignore patterns, adding CI cleanup steps with version spec escaping, introducing a new multi-stage Dockerfile.slim for CUDA-enabled builds, performing minor code adjustments (type conversions, variable naming), updating dependency pinning (pytorch-ignite, onnxruntime, torch constraints), and refining test coverage with conditional execution and parameter tweaks. Approximately 12 files affected spanning configuration, Docker, core modules, and tests.
Estimated code review effort
🎯 3 (Moderate) | ⏱️ ~25 minutes
- Dockerfile.slim: Most complex change—three-stage build pipeline with CUDA integration, system dependencies, NGC CLI, and environment configuration requires careful validation of layer logic and artifact transfers.
- requirements.txt: Unified torch upper bound (<2.9) across platforms with Windows-specific exclusion; verify constraint consistency and compatibility implications.
- CI workflow (pythonapp.yml): Repeated cleanup steps and version spec escaping across multiple jobs; confirm correctness of repeated pattern.
- Test modifications: Heterogeneous changes (skipIf decorator, weights_only flag, tolerance adjustment, CUDA removal)—each requires separate reasoning for test intent and side effects.
Pre-merge checks and finishing touches
❌ Failed checks (1 warning, 1 inconclusive)
| Check name | Status | Explanation | Resolution |
|---|---|---|---|
| Docstring Coverage | ⚠️ Warning | Docstring coverage is 62.50% which is insufficient. The required threshold is 80.00%. | You can run @coderabbitai generate docstrings to improve docstring coverage. |
| Title check | ❓ Inconclusive | The title 'Docker Slim Image' is vague and doesn't clearly convey the purpose—whether it's creating a new slim image, optimizing an existing one, or fixing issues within it. | Clarify the title to be more specific, e.g., 'Add Dockerfile.slim for space-optimized image' or 'Create slim Docker image with PyTorch <2.9 constraint'. |
✅ Passed checks (1 passed)
| Check name | Status | Explanation |
|---|---|---|
| Description check | ✅ Passed | The description covers intent, changes, and testing approach; however, it lacks a proper issue reference and doesn't enumerate all major file changes (e.g., Dockerfile.slim, requirements updates, test fixes). |
✨ Finishing touches
- [ ] 📝 Generate docstrings
🧪 Generate unit tests (beta)
- [ ] Create PR with unit tests
- [ ] Post copyable unit tests in a comment
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.
Comment @coderabbitai help to get the list of available commands and usage tips.
Hi @KumoLiu I think we should push this one through to solve some of the issues we're seeing with actions running out of space. Some other PRs are stuck with failing checks but I think the changes I have here will fix them.