spinalcordtoolbox
spinalcordtoolbox copied to clipboard
Investigate ways of installing SCT with GPU support for internal dev usage
SCT contains scripts built on deep learning frameworks (mainly PyTorch) that can be sped up using GPUs. Historically, however, SCT has chosen to be CPU-only to avoid the maintenance burden of managing GPU dependencies (e.g. CUDA) and to minimize installation size.
In most cases, CPU-only works fine for our users. But, for internal R&D, long inference times can sometimes impede research (see https://github.com/spinalcordtoolbox/spinalcordtoolbox/pull/4345#issuecomment-1922466223). And, internally, our students have more expertise for installing and troubleshooting GPU dependencies than the average SCT user. So, ideally we would have some sort of process for enabling GPU inference that students of the lab can follow when using SCT in their research.
This issue is for testing GPU inference, and for documenting a set of steps needed to install the necessary dependencies.
@valosekj notes:
Maybe we could optimize
requirements_gpu.txt
to make SCT working onromane
androsenberg
.
So, our internal workstations are probably a good testing ground for any changes we make to SCT.
So, it looks like installing and configuring SCT to run on GPUs is very easy.
I expected it to be more difficult (due to managing CUDA dependencies, installing GPU drivers, etc.). However, it seems like torch
has vastly improved its dependency management in the years since I last used it, as torch
now bundles its own CUDA libraries, rather than relying on the system's CUDA
toolkit. (This does have the tradeoff of making an SCT GPU installation quite large, though, as previously discussed in #2669. But, installation is fairly seamless.)
[!NOTE] There are platform-specific quirks with installing the GPU versions of
torch
libraries:
torch
on Linux is GPU by default. CPU requires--index-url https://download.pytorch.org/whl/cpu
.torch
on Windows is CPU by default. GPU requires--index-url https://download.pytorch.org/whl/cpu
.torch
on macOS is CPU only. There are no GPU versions for macOS.You can check this yourself here: https://pytorch.org/get-started/locally/
Besides installing the GPU packages, there still quite a lot of discussion points. Here's what I've come up with while researching this issue:
- How should we implement the installation of GPU dependencies?
- Manual steps, written down in internal documentation. (e.g. activate SCT conda environment, install GPU dependencies using
pip install
.) - Include the steps in the installer, and include some sort of CLI option. (Pros: Would make the steps more accessible/reproducible. Cons: Might expose curious users to installation issues.)
- Manual steps, written down in internal documentation. (e.g. activate SCT conda environment, install GPU dependencies using
- How should we implement the choice of CPU/GPU inference?
- Automatically detect the presence of GPUs, fallback to CPUs.
- Default to CPU inference, and only run GPU inference if specified.
- If GPU inference is opt-in, how do we want to configure this in SCT?
- CLI option on
sct_deepseg
. - Global environment variable. (e.g.
$SCT_USE_GPU_DEVICE={0,1,2,3}
, with unset == CPU) - Global setting in SCT's config file. (e.g.
setup.cfg
)
- CLI option on
- Given that NeuroPoly's GPU workstations have multiple GPUs, and students will often book specific GPUs, how should we specify which GPU to use for inference?
- Same as above "CPU/GPU" setting? Separate?
- Should we also retroactively add GPU support to non-
sct_deepseg
commands?- (e.g.
onnxruntime-gpu
for our old TensorFlow-developed models?) - Note: When I tried this,
pip
complained, becauseivadomed
specifiesonnxruntime
, which uses a different pip package name thanonnxruntime-gpu
. - Note 2: To add GPU inference support to ivadomed models, I believe all we have to do is pass
gpu_id
toivadomed.inference.segment_volume
.
- (e.g.
All of these options have various trade-offs (discoverability vs. "hidden" support for internal usage only). But, for example, if we decided to go for the simplest implementation (manual installation steps, environment variable) I could create a prototype PR rather quickly for this. :)
Great investigation and summary, thanks!
I'm taking a look at this now, with the hope of testing a prototype on romane
and rosenberg
on Tuesday.
To try to resolve the questions above, here's a sketch of how I was thinking of implementing this:
- As far as whether we enable just GPU
torch
vs. turning GPU support on for everything (e.g.onnxruntime
)... I think inference times for our "older" models are quite zippy already? I imagine the architectures back then were a little less resource-hungry, whereas the more advanced nnUNet-based models are the ones with several minutes-long inference times. So, I think it would be OK to focus just onsct_deepseg
'storch
-based models for now? - While I agree that having a separate
requirements-gpu.txt
file would be helpful for transparency, I'm a little worried that this wouldn't be hidden enough, and thus suggest to users that this is a supported way to install SCT (which it isn't, yet!). - So, I was thinking of putting the steps directly into the
install_sct
script, controlled by a bash script option. I figure this is OK for a prototype because A) For internal testing usage, supporting only the Linux/macOS installer at first should be sufficient, and B) I think the CLI script options are hidden enough that users won't stumble upon this themselves. - The only thing I'm still working out is dependency resolution when combining the new GPU requirements with the existing
requirements.txt
file:- Presumably, we want to install them all together (to let
pip
do its dependency resolution magic). - But, if we're combining the requirements, How do we "override" the CPU-specific lines (e.g.
--extra-index-url
)? (I'd want to avoid downloading both the CPUtorch
and then the GPUtorch
on consecutivepip install
lines.) - Maybe what we could do, then, is filter
requirements.txt
on the fly withininstall_sct
(removing--extra-index-url
so that GPU torch is installed). That way, we're basing our GPU requirements off of the existingrequirements.txt
file, without having to maintain parallel sets of requirements.
- Presumably, we want to install them all together (to let
- As far as quickly testing the GPU support goes, I figure we might want to add a section to
sct_check_dependencies
that mimics the internalnvverify
script we developed?- However, I don't know if we want to run the checks unconditionally, lest we alert regular users to WIP GPU support.
- So, maybe what we could do is check if GPU-specific dependencies are present first (which would suggest that
install_sct
was run with the GPU option). For example, checkingtorch.__version__
will currently return the+cpu
specifier. So, if this isn't present, then the GPU version is installed, and only then would we perform the GPU-specific checks (for present GPU, CUDA, simple tensor computation, etc.). That way, the checks are hidden from users by default.
- As far as running GPU inference, I think I'm leaning towards opt-in with an SCT-specific environment variables? Just because it's the most straightforward to implement as far as a quick prototype goes. My only concern is handling concurrency when used with
sct_run_batch
:thinking:. Can multiple jobs be run on a single GPU? Will there be a performance hit if multiplesct_deepseg
commands are being run concurrently? How would we spread out the workload?
tl;dr:
- [ ]
install_sct
: CLI option, filterrequirements.txt
to filter torch--extra-index-url
- [ ]
sct_check_dependencies
: "Hidden" conditionally-run checks that mimicnvverify
script. - [ ] Use an environment variable to conditionally select
device=cuda
in sct_deepseg only.
I have a prototype almost ready in jn/4360-torch-gpu-sct_deepseg
.
Sample sct_check_dependencies
output (on my local Windows desktop):
I just need to enable GPU inference for sct_deepseg
, then run some tests on romane
/rosenberg
.
romane
's 4 GPUs are all booked up for now, but rosenberg
has some space, so I've booked some time. Hopefully I'll be able to report back prior to tomorrow's meeting.
Huzzah! It works!
Steps:
- Clone SCT to your home folder.
- Checkout
jn/4360-torch-gpu-sct_deepseg
- Install SCT using the
-g
flag (e.g../install_sct -iyg
) - Make sure that
sct_check_dependencies
displays the available GPUs. - Set
CUDA_VISIBLE_DEVICES
to one of the GPU IDs (e.g. I booked GPU 2 on rosenberg, so I setCUDA_VISIBLE_DEVICES=2
). Then, re-runsct_check_dependencies
and make sure that the the "number of GPUs available to PyTorch" drops to 1.- Note: Since PyTorch can only see the GPUs that are visible, they will start with ID 0 internally to PyTorch, even though the "real" ID for this GPU was 2. There's not much we can do about that?
- Set
SCT_USE_GPU=1
to turn on GPU inference.- For now, this acts as a "hidden" switch to turn on GPU usage, only for internal lab folks to know about. ;)
- Run a sample command (e.g.
SCT_USE_GPU=1 sct_deepseg -task seg_sc_lesion_t2w_sci -i t2.nii.gz
)
Result:
p114154@rosenberg:~/repos/spinalcordtoolbox/data/sct_example_data/t2$ SCT_USE_GPU=1 sct_deepseg -task seg_sc_lesion_t2w_sci -i t2.nii.gz
--
Spinal Cord Toolbox (git-jn/4360-torch-gpu-sct_deepseg-17ee9edbff5b6a85cafb19fdfff28b2b08873f45)
sct_deepseg -task seg_sc_lesion_t2w_sci -i t2.nii.gz
--
Running inference on device: cuda
/home/GRAMES.POLYMTL.CA/p114154/repos/spinalcordtoolbox/python/envs/venv_sct/lib/python3.9/site-packages/nnunetv2/utilities/plans_handling/plans_handler.py:37: UserWarning: Detected old nnU-Net plans format. Attempting to reconstruct network architecture parameters. If this fails, rerun nnUNetv2_plan_experiment for your dataset. If you use a custom architecture, please downgrade nnU-Net to the version you implemented this or update your implementation + plans.
warnings.warn("Detected old nnU-Net plans format. Attempting to reconstruct network architecture "
Model loaded successfully. Fetching test data...
Creating temporary folder (/tmp/sct_2024-04-03_14-11-08_sct_deepseg_y7sb0via)
Copied t2.nii.gz to /tmp/sct_2024-04-03_14-11-08_sct_deepseg_y7sb0via/t2.nii.gz
Changing orientation of the input to the model orientation (RPI)...
Starting inference...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:15<00:00, 1.30it/s]
Inference done.
Total inference time: 0 minute(s) 25 seconds
Reorienting the prediction back to original orientation...
Reorientation to original orientation LPI done.
Image header specifies datatype 'int16', but array is of type 'uint8'. Header metadata will be overwritten to use 'uint8'.
Saving results to: /tmp/sct_2024-04-03_14-11-08_sct_deepseg_y7sb0via/nnUNet_prediction/t2_pred_sc_seg.nii.gz
Image header specifies datatype 'int16', but array is of type 'uint8'. Header metadata will be overwritten to use 'uint8'.
Saving results to: /tmp/sct_2024-04-03_14-11-08_sct_deepseg_y7sb0via/nnUNet_prediction/t2_pred_lesion_seg.nii.gz
Done! To view results, type:
fsleyes t2.nii.gz -cm greyscale t2_sc_seg.nii.gz -cm red -a 70.0 &
Done! To view results, type:
fsleyes t2.nii.gz -cm greyscale t2_lesion_seg.nii.gz -cm subcortical -a 70.0 &
p114154@rosenberg:~/repos/spinalcordtoolbox/data/sct_example_data/t2$ sct_deepseg -task seg_sc_lesion_t2w_sci -i t2.nii.gz
--
Spinal Cord Toolbox (git-jn/4360-torch-gpu-sct_deepseg-17ee9edbff5b6a85cafb19fdfff28b2b08873f45)
sct_deepseg -task seg_sc_lesion_t2w_sci -i t2.nii.gz
--
perform_everything_on_device=True is only supported for cuda devices! Setting this to False
Running inference on device: cpu
/home/GRAMES.POLYMTL.CA/p114154/repos/spinalcordtoolbox/python/envs/venv_sct/lib/python3.9/site-packages/nnunetv2/utilities/plans_handling/plans_handler.py:37: UserWarning: Detected old nnU-Net plans format. Attempting to reconstruct network architecture parameters. If this fails, rerun nnUNetv2_plan_experiment for your dataset. If you use a custom architecture, please downgrade nnU-Net to the version you implemented this or update your implementation + plans.
warnings.warn("Detected old nnU-Net plans format. Attempting to reconstruct network architecture "
Model loaded successfully. Fetching test data...
Creating temporary folder (/tmp/sct_2024-04-03_14-11-49_sct_deepseg_vtmb5mrl)
Copied t2.nii.gz to /tmp/sct_2024-04-03_14-11-49_sct_deepseg_vtmb5mrl/t2.nii.gz
Changing orientation of the input to the model orientation (RPI)...
Starting inference...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [01:29<00:00, 4.45s/it]
Inference done.
Total inference time: 1 minute(s) 37 seconds
Reorienting the prediction back to original orientation...
Reorientation to original orientation LPI done.
Image header specifies datatype 'int16', but array is of type 'uint8'. Header metadata will be overwritten to use 'uint8'.
Saving results to: /tmp/sct_2024-04-03_14-11-49_sct_deepseg_vtmb5mrl/nnUNet_prediction/t2_pred_sc_seg.nii.gz
Image header specifies datatype 'int16', but array is of type 'uint8'. Header metadata will be overwritten to use 'uint8'.
Saving results to: /tmp/sct_2024-04-03_14-11-49_sct_deepseg_vtmb5mrl/nnUNet_prediction/t2_pred_lesion_seg.nii.gz
File t2_sc_seg.nii.gz already exists. Will overwrite it.
File t2_lesion_seg.nii.gz already exists. Will overwrite it.
Done! To view results, type:
fsleyes t2.nii.gz -cm greyscale t2_sc_seg.nii.gz -cm red -a 70.0 &
Done! To view results, type:
fsleyes t2.nii.gz -cm greyscale t2_lesion_seg.nii.gz -cm subcortical -a 70.0 &