cucim
cucim copied to clipboard
[QST] How to enable cucim compatibility mode - Debian 9
Installed cucim version: 22.4.0 Our system has 4 V100 GPU's and is running on Debian 9.
Tue Apr 19 22:17:23 2022
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 510.60.02 Driver Version: 510.60.02 CUDA Version: 11.6 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:61:00.0 Off | 0 |
| N/A 38C P0 58W / 300W | 3986MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... Off | 00000000:62:00.0 Off | 0 |
| N/A 34C P0 40W / 300W | 3MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 2 Tesla V100-SXM2... Off | 00000000:89:00.0 Off | 0 |
| N/A 33C P0 38W / 300W | 3MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 3 Tesla V100-SXM2... Off | 00000000:8A:00.0 Off | 0 |
| N/A 34C P0 43W / 300W | 3MiB / 16384MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
On that Debian 9 OS we run the NVidia Ubuntu CUDA Docker image:
FROM nvcr.io/nvidia/cuda:11.6.2-devel-ubuntu20.04
...
RUN pip3 install --upgrade cucim
When loading an svs image using cucim, I get:
[Error] cuFileHandleRegister fd: 36 (/data/<my_file>.svs), status: internal error. Would work with cuCIM's compatibility mode.
Installing GDS on my native Debian 9 OS is not possible: gds-tools are only available for debian10 and debian11, but not for debian9: https://developer.download.nvidia.com/compute/cuda/repos/ And this system is used by multiple users, so updating OS version is non-trivial...
Reverting to a cucim version before enabling GDS support is also not an option, since this doesn't support SVS files yet.
So I tried disabling GDS according to these instructions: https://docs.nvidia.com/gpudirect-storage/troubleshooting-guide/index.html#enable-comp-mode
The first instruction (removing the nvidia-fs kernel driver) however yields:
sudo rmmod nvidia-fs
rmmod: ERROR: Module nvidia_fs is not currently loaded
This is confirmed not to be loaded using lsmod | grep nvidia
My /etc/cupy.json file also has ['properties']['allow_compat_mode'] set to true
So I would expect to be running in compatibility mode, which is not what the above error message is implying.
How would I get cucim 22.4.0 working in GDS compatibility mode? Or would you have any other advice (excl. updating OS :-) )
Thanks in advance...
Hi @diricxbart !
When loading an svs image using cucim
Could you please share the line of the code you used to load svs image to understand the context?
AFAIU, the below message wouldn't happen unless you called read_region()
method with device='cuda'
parameter or you have used cuCIM's filesystem package (cucim.clara.filesystem
).
[Error] cuFileHandleRegister fd: 36 (/data/<my_file>.svs), status: internal error. Would work with cuCIM's compatibility mode.
There is not much benefit to specifying device='cuda'
in read_region()
method to use GDS+nvJPEG at this moment (we need to improve the performance).
If you have used fs.open()
method, it is not for loading an image file (it is for reading a block of file) and you can use rp
option to not use GDS.
import cucim.clara.filesystem as fs
fd = fs.open("input/image.tif", "r")
fs.close(fd) # same with fd.close()
# Open file without using GDS
fd2 = fs.open("input/image.tif", "rp")
fs.close(fd2) # same with fd2.close()
Hi Gigon,
Thank you for your response...
This code (so indeed with setting device to cuda
):
slide = CuImage('/data/pathology/TCGA-G7-A8LE-01A-01-TS1.39D4D79F-6CE4-441C-8EBD-42323F7B9C11.svs')
img = slide.read_region(level=3, device='cuda')
Results in this error:
[Error] cuFileHandleRegister fd: 101 (/data/pathology/TCGA-G7-A8LE-01A-01-TS1.39D4D79F-6CE4-441C-8EBD-42323F7B9C11.svs), status: internal error. Would work with cuCIM's compatibility mode.
If I set device='cpu
, then it does work properly...
Could you please elaborate on the purpose / impact / consequences of the device property? My understanding was that setting device to 'cuda' results in a cupy array on the GPU, while 'cpu' results in a numpy array on CPU?
Hi @diricxbart
By default, read_region()
method doesn't use GPU-accelerated libraries to decode compressed image data (jpeg/jpeg2000) in .svs or .tif format and you don't need to specify device="cpu"
in the method.
cuCIM optimizes TIFF(-like) image loader so its read is faster than other libraries 'with CPU'.
Using GPU-accelerated GPU decoding libraries is particularly useful when
- image to decode is large, and
- decoding multiple images in parallel (batch loading)
And, not using GPU(CUDA)-based image loader is particularly useful when
- GPU resource needs to be used for training
- using multi-process data loader while training
- https://pytorch.org/docs/stable/data.html#torch.utils.data.DataLoader
From v22.02.00, we introduce multithreading and batch processing feature in read_region()
and leverage GDS and NVJpeg to load and decode JPEG-compressed image data if device="cuda"
is given.
https://github.com/rapidsai/cucim/wiki/release_notes_v22.02.00#2-supporting-multithreading-and-batch-processing
cuCIM now supports loading the entire image with multi-threads. It also supports batch loading of images.
If
device
parameter ofread_region()
method is"cuda"
, it loads a relevant portion of the image file (compressed tile data) into GPU memory using cuFile(GDS, GPUDirect Storage), then decompress those data using nvJPEG's Batched Image Decoding API.
Current implementations are not efficient and performance is poor compared to CPU implementations. However, we plan to improve it over the next versions.
Since we are not utilizing CUDA streams and GPU memory well, current implementations are not efficient and we would like to improve it.
For this reason, we are not recommending using device="cuda"
for now.
If you want to move loaded data to GPU, please convert CuImage
object (the output of read_region()
method) to CuPy array by cupy.asarray() method.
import cupy as cp
slide = CuImage('/data/pathology/TCGA-G7-A8LE-01A-01-TS1.39D4D79F-6CE4-441C-8EBD-42323F7B9C11.svs')
img = slide.read_region(level=3) # you can add `, num_workers=8` to load the image with 8 threads.
img_gpu = cp.asarray(img)
Using Cache feature would make it faster when loading multiple patches from arbitrarily locations and sizes in a WSI image.
import cupy as cp
CuImage.cache('per_process', memory_capacity=1024) # Using 1GB of system memory for cache.
slide = CuImage('/data/pathology/TCGA-G7-A8LE-01A-01-TS1.39D4D79F-6CE4-441C-8EBD-42323F7B9C11.svs')
img = slide.read_region((100, 100), (256, 256) , level=0)
img = slide.read_region((110, 110), (256, 256) , level=0) # much faster this time as it uses cached tile data from cache.
Debian 9 is EOL:
- https://wiki.debian.org/LTS
- https://www.debian.org/releases/stretch/
If this is still occurring with a more recent OS, let's open a new issue with reproducer taking into account the feedback already given here
Thanks all! 🙏