[CUDA][Suggestion] ROI Pool half-precision build error due to ambiguous comparison
š Describe the bug
When building the ROI Pool CUDA kernel on Windows + MSVC + CUDA, compilation fails if AT_DISPATCH_FLOATING_TYPES_AND_HALF instantiates with T=half. The comparison operator becomes ambiguous and MSVC reports a build error.
Minimal Repro Example
// roi_pool_forward_kernel_impl excerpt
if (offset_input[input_index] > maxval) {
maxval = offset_input[input_index];
maxidx = input_index;
}
- With T=half, this line fails to compile on MSVC.
Environment
OS: Windows 10/11
Compiler: MSVC 19.x (Visual Studio 2022)
CUDA: 12.x
PyTorch / torchvision: built from source (latest main)
Proposed Fix
Following PyTorch conventions, it may be preferable to compare in an accumulation type:
using acc_t = at::acc_type<T, /*is_cuda=*/true>;
acc_t v = static_cast<acc_t>(offset_input[input_index]);
acc_t mv = static_cast<acc_t>(maxval);
if (v > mv) {
maxval = offset_input[input_index];
maxidx = input_index;
}
Also, initialize maxval in a type-consistent way:
T maxval = is_empty ? T(0) : std::numeric_limits<T>::lowest();
This avoids the MSVC ambiguity and preserves precision across float/double/half.
###Suggestion
Would it make sense to update the ROI Pool kernel accordingly? I am happy to prepare a PR if maintainers agree with this direction.
Note
Iām a junior-level software engineer and this is my first time submitting a report here. If I missed any guidelines or phrased things imperfectly, please kindly let me know. Thank you for your understanding.
Versions
https://github.com/pytorch/vision/blob/main/torchvision/csrc/ops/cuda/roi_pool_kernel.cu
Hey @kimchioverfit, thanks for posting. Sorry for the late reply. Do you still have the issue. Can you use one of the already compiled wheels from PyPI (https://pypi.org/project/torchvision/#files) to unblock you?
Hi @AntoineSimoulin, thanks for the follow-up!
I tested the model with the official PyPI wheels:
- torch 2.8.0+cu129
- torchvision 0.23.0
With the wheel setup, everything works fine ā the TorchScript model loads and runs without any issues.
In my C++ setup I also used the matching libtorch 2.8.0+cu129 build (the same version as the Python wheels).
So the issue does not reproduce with the official wheels,
and only appears in the libtorch + manually-built torchvision csrc configuration on Windows.
Happy to share the exact snippet or patch if that helps.