spinalcordtoolbox Investigate ways of installing SCT with GPU support for internal dev usage

SCT contains scripts built on deep learning frameworks (mainly PyTorch) that can be sped up using GPUs. Historically, however, SCT has chosen to be CPU-only to avoid the maintenance burden of managing GPU dependencies (e.g. CUDA) and to minimize installation size.

In most cases, CPU-only works fine for our users. But, for internal R&D, long inference times can sometimes impede research (see https://github.com/spinalcordtoolbox/spinalcordtoolbox/pull/4345#issuecomment-1922466223). And, internally, our students have more expertise for installing and troubleshooting GPU dependencies than the average SCT user. So, ideally we would have some sort of process for enabling GPU inference that students of the lab can follow when using SCT in their research.

This issue is for testing GPU inference, and for documenting a set of steps needed to install the necessary dependencies.

@valosekj notes:

Maybe we could optimize requirements_gpu.txt to make SCT working on romane and rosenberg.

So, our internal workstations are probably a good testing ground for any changes we make to SCT.

Feb 05 '24 16:02 joshuacwnewton

So, it looks like installing and configuring SCT to run on GPUs is very easy.

I expected it to be more difficult (due to managing CUDA dependencies, installing GPU drivers, etc.). However, it seems like torch has vastly improved its dependency management in the years since I last used it, as torch now bundles its own CUDA libraries, rather than relying on the system's CUDA toolkit. (This does have the tradeoff of making an SCT GPU installation quite large, though, as previously discussed in #2669. But, installation is fairly seamless.)

[!NOTE] There are platform-specific quirks with installing the GPU versions of torch libraries:

torch on Linux is GPU by default. CPU requires --index-url https://download.pytorch.org/whl/cpu.

torch on Windows is CPU by default. GPU requires --index-url https://download.pytorch.org/whl/cpu.

torch on macOS is CPU only. There are no GPU versions for macOS.

You can check this yourself here: https://pytorch.org/get-started/locally/

Besides installing the GPU packages, there still quite a lot of discussion points. Here's what I've come up with while researching this issue:

How should we implement the installation of GPU dependencies?
1. Manual steps, written down in internal documentation. (e.g. activate SCT conda environment, install GPU dependencies using pip install.)
2. Include the steps in the installer, and include some sort of CLI option. (Pros: Would make the steps more accessible/reproducible. Cons: Might expose curious users to installation issues.)
How should we implement the choice of CPU/GPU inference?
1. Automatically detect the presence of GPUs, fallback to CPUs.
2. Default to CPU inference, and only run GPU inference if specified.
If GPU inference is opt-in, how do we want to configure this in SCT?
1. CLI option on sct_deepseg.
2. Global environment variable. (e.g. $SCT_USE_GPU_DEVICE={0,1,2,3}, with unset == CPU)
3. Global setting in SCT's config file. (e.g. setup.cfg)
Given that NeuroPoly's GPU workstations have multiple GPUs, and students will often book specific GPUs, how should we specify which GPU to use for inference?
- Same as above "CPU/GPU" setting? Separate?
Should we also retroactively add GPU support to non-sct_deepseg commands?
- (e.g. onnxruntime-gpu for our old TensorFlow-developed models?)
- Note: When I tried this, pip complained, because ivadomed specifies onnxruntime, which uses a different pip package name than onnxruntime-gpu.
- Note 2: To add GPU inference support to ivadomed models, I believe all we have to do is pass gpu_id to ivadomed.inference.segment_volume.

All of these options have various trade-offs (discoverability vs. "hidden" support for internal usage only). But, for example, if we decided to go for the simplest implementation (manual installation steps, environment variable) I could create a prototype PR rather quickly for this. :)

Feb 07 '24 18:02 joshuacwnewton

Great investigation and summary, thanks!

Feb 08 '24 20:02 mguaypaq

I'm taking a look at this now, with the hope of testing a prototype on romane and rosenberg on Tuesday.

To try to resolve the questions above, here's a sketch of how I was thinking of implementing this:

As far as whether we enable just GPU torch vs. turning GPU support on for everything (e.g. onnxruntime)... I think inference times for our "older" models are quite zippy already? I imagine the architectures back then were a little less resource-hungry, whereas the more advanced nnUNet-based models are the ones with several minutes-long inference times. So, I think it would be OK to focus just on sct_deepseg's torch-based models for now?
While I agree that having a separate requirements-gpu.txt file would be helpful for transparency, I'm a little worried that this wouldn't be hidden enough, and thus suggest to users that this is a supported way to install SCT (which it isn't, yet!).
So, I was thinking of putting the steps directly into the install_sct script, controlled by a bash script option. I figure this is OK for a prototype because A) For internal testing usage, supporting only the Linux/macOS installer at first should be sufficient, and B) I think the CLI script options are hidden enough that users won't stumble upon this themselves.
The only thing I'm still working out is dependency resolution when combining the new GPU requirements with the existing requirements.txt file:
- Presumably, we want to install them all together (to let pip do its dependency resolution magic).
- But, if we're combining the requirements, How do we "override" the CPU-specific lines (e.g. --extra-index-url)? (I'd want to avoid downloading both the CPU torch and then the GPU torch on consecutive pip install lines.)
- Maybe what we could do, then, is filter requirements.txt on the fly within install_sct (removing --extra-index-url so that GPU torch is installed). That way, we're basing our GPU requirements off of the existing requirements.txt file, without having to maintain parallel sets of requirements.
As far as quickly testing the GPU support goes, I figure we might want to add a section to sct_check_dependencies that mimics the internal nvverify script we developed?
- However, I don't know if we want to run the checks unconditionally, lest we alert regular users to WIP GPU support.
- So, maybe what we could do is check if GPU-specific dependencies are present first (which would suggest that install_sct was run with the GPU option). For example, checking torch.__version__ will currently return the +cpu specifier. So, if this isn't present, then the GPU version is installed, and only then would we perform the GPU-specific checks (for present GPU, CUDA, simple tensor computation, etc.). That way, the checks are hidden from users by default.
As far as running GPU inference, I think I'm leaning towards opt-in with an SCT-specific environment variables? Just because it's the most straightforward to implement as far as a quick prototype goes. My only concern is handling concurrency when used with sct_run_batch :thinking:. Can multiple jobs be run on a single GPU? Will there be a performance hit if multiple sct_deepseg commands are being run concurrently? How would we spread out the workload?

tl;dr:

[ ] install_sct: CLI option, filter requirements.txt to filter torch --extra-index-url
[ ] sct_check_dependencies: "Hidden" conditionally-run checks that mimic nvverify script.
[ ] Use an environment variable to conditionally select device=cuda in sct_deepseg only.

Apr 01 '24 22:04 joshuacwnewton

I have a prototype almost ready in jn/4360-torch-gpu-sct_deepseg.

Sample sct_check_dependencies output (on my local Windows desktop):

I just need to enable GPU inference for sct_deepseg, then run some tests on romane/rosenberg.

romane's 4 GPUs are all booked up for now, but rosenberg has some space, so I've booked some time. Hopefully I'll be able to report back prior to tomorrow's meeting.

Apr 02 '24 22:04 joshuacwnewton

Huzzah! It works!

Steps:

Clone SCT to your home folder.
Checkout jn/4360-torch-gpu-sct_deepseg
Install SCT using the -g flag (e.g. ./install_sct -iyg)
Make sure that sct_check_dependencies displays the available GPUs.
Set CUDA_VISIBLE_DEVICES to one of the GPU IDs (e.g. I booked GPU 2 on rosenberg, so I set CUDA_VISIBLE_DEVICES=2). Then, re-run sct_check_dependencies and make sure that the the "number of GPUs available to PyTorch" drops to 1.
- Note: Since PyTorch can only see the GPUs that are visible, they will start with ID 0 internally to PyTorch, even though the "real" ID for this GPU was 2. There's not much we can do about that?
Set SCT_USE_GPU=1 to turn on GPU inference.
- For now, this acts as a "hidden" switch to turn on GPU usage, only for internal lab folks to know about. ;)
Run a sample command (e.g. SCT_USE_GPU=1 sct_deepseg -task seg_sc_lesion_t2w_sci -i t2.nii.gz)

Result:

p114154@rosenberg:~/repos/spinalcordtoolbox/data/sct_example_data/t2$ SCT_USE_GPU=1 sct_deepseg -task seg_sc_lesion_t2w_sci -i t2.nii.gz 

--
Spinal Cord Toolbox (git-jn/4360-torch-gpu-sct_deepseg-17ee9edbff5b6a85cafb19fdfff28b2b08873f45)

sct_deepseg -task seg_sc_lesion_t2w_sci -i t2.nii.gz
--

Running inference on device: cuda
/home/GRAMES.POLYMTL.CA/p114154/repos/spinalcordtoolbox/python/envs/venv_sct/lib/python3.9/site-packages/nnunetv2/utilities/plans_handling/plans_handler.py:37: UserWarning: Detected old nnU-Net plans format. Attempting to reconstruct network architecture parameters. If this fails, rerun nnUNetv2_plan_experiment for your dataset. If you use a custom architecture, please downgrade nnU-Net to the version you implemented this or update your implementation + plans.
  warnings.warn("Detected old nnU-Net plans format. Attempting to reconstruct network architecture "
Model loaded successfully. Fetching test data...
Creating temporary folder (/tmp/sct_2024-04-03_14-11-08_sct_deepseg_y7sb0via)
Copied t2.nii.gz to /tmp/sct_2024-04-03_14-11-08_sct_deepseg_y7sb0via/t2.nii.gz
Changing orientation of the input to the model orientation (RPI)...
Starting inference...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [00:15<00:00,  1.30it/s]
Inference done.
Total inference time: 0 minute(s) 25 seconds
Reorienting the prediction back to original orientation...
Reorientation to original orientation LPI done.
Image header specifies datatype 'int16', but array is of type 'uint8'. Header metadata will be overwritten to use 'uint8'.
Saving results to: /tmp/sct_2024-04-03_14-11-08_sct_deepseg_y7sb0via/nnUNet_prediction/t2_pred_sc_seg.nii.gz
Image header specifies datatype 'int16', but array is of type 'uint8'. Header metadata will be overwritten to use 'uint8'.
Saving results to: /tmp/sct_2024-04-03_14-11-08_sct_deepseg_y7sb0via/nnUNet_prediction/t2_pred_lesion_seg.nii.gz

Done! To view results, type:
fsleyes t2.nii.gz -cm greyscale t2_sc_seg.nii.gz -cm red -a 70.0 &


Done! To view results, type:
fsleyes t2.nii.gz -cm greyscale t2_lesion_seg.nii.gz -cm subcortical -a 70.0 &

p114154@rosenberg:~/repos/spinalcordtoolbox/data/sct_example_data/t2$ sct_deepseg -task seg_sc_lesion_t2w_sci -i t2.nii.gz 

--
Spinal Cord Toolbox (git-jn/4360-torch-gpu-sct_deepseg-17ee9edbff5b6a85cafb19fdfff28b2b08873f45)

sct_deepseg -task seg_sc_lesion_t2w_sci -i t2.nii.gz
--

perform_everything_on_device=True is only supported for cuda devices! Setting this to False
Running inference on device: cpu
/home/GRAMES.POLYMTL.CA/p114154/repos/spinalcordtoolbox/python/envs/venv_sct/lib/python3.9/site-packages/nnunetv2/utilities/plans_handling/plans_handler.py:37: UserWarning: Detected old nnU-Net plans format. Attempting to reconstruct network architecture parameters. If this fails, rerun nnUNetv2_plan_experiment for your dataset. If you use a custom architecture, please downgrade nnU-Net to the version you implemented this or update your implementation + plans.
  warnings.warn("Detected old nnU-Net plans format. Attempting to reconstruct network architecture "
Model loaded successfully. Fetching test data...
Creating temporary folder (/tmp/sct_2024-04-03_14-11-49_sct_deepseg_vtmb5mrl)
Copied t2.nii.gz to /tmp/sct_2024-04-03_14-11-49_sct_deepseg_vtmb5mrl/t2.nii.gz
Changing orientation of the input to the model orientation (RPI)...
Starting inference...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 20/20 [01:29<00:00,  4.45s/it]
Inference done.
Total inference time: 1 minute(s) 37 seconds
Reorienting the prediction back to original orientation...
Reorientation to original orientation LPI done.
Image header specifies datatype 'int16', but array is of type 'uint8'. Header metadata will be overwritten to use 'uint8'.
Saving results to: /tmp/sct_2024-04-03_14-11-49_sct_deepseg_vtmb5mrl/nnUNet_prediction/t2_pred_sc_seg.nii.gz
Image header specifies datatype 'int16', but array is of type 'uint8'. Header metadata will be overwritten to use 'uint8'.
Saving results to: /tmp/sct_2024-04-03_14-11-49_sct_deepseg_vtmb5mrl/nnUNet_prediction/t2_pred_lesion_seg.nii.gz
File t2_sc_seg.nii.gz already exists. Will overwrite it.
File t2_lesion_seg.nii.gz already exists. Will overwrite it.

Done! To view results, type:
fsleyes t2.nii.gz -cm greyscale t2_sc_seg.nii.gz -cm red -a 70.0 &


Done! To view results, type:
fsleyes t2.nii.gz -cm greyscale t2_lesion_seg.nii.gz -cm subcortical -a 70.0 &

Apr 03 '24 18:04 joshuacwnewton

spinalcordtoolbox spinalcordtoolbox copied to clipboard

Investigate ways of installing SCT with GPU support for internal dev usage

spinalcordtoolbox
spinalcordtoolbox copied to clipboard