CV-CUDA [BUG] Some method not support float16

Describe the bug Get many error when I use float16 get many error and I don't know how to solve . float16 is a common precision why not support .

Steps/Code to reproduce bug Follow this guide http://matthewrocklin.com/blog/work/2018/02/28/minimal-bug-reports to craft a minimal bug report. This helps us reproduce the issue you're having and resolve the issue more quickly.

Get same error when I use float16 as dtype as params of cvcuda.normalize https://github.com/CVCUDA/CV-CUDA/issues/142

        mean_cp = np_to_cuda_buffer(mean_params, dtype=np.float16).reshape(1, 1, 3)
        std_cp = np_to_cuda_buffer(std_params, dtype=np.float16).reshape(1, 1, 3)
        mean_tensor = cvcuda.as_tensor(mean_cp, nvcv.TensorLayout.HWC)
        std_tensor = cvcuda.as_tensor(std_cp, nvcv.TensorLayout.HWC)
        # Convert image to numpy array and scale to [0,1]
        cv_tensor = cvcuda.convertto(cv_tensor, dtype, scale=1.0 / 255.0, stream=cls.stream)
        # Normalize using mean and std (broadcast across height and width)
        cv_tensor = cvcuda.normalize(cv_tensor,
                                     base=mean_tensor,
                                     scale=std_tensor,
                                     flags=cvcuda.NormalizeFlags.SCALE_IS_STDDEV, stream=cls.stream)

get error

RuntimeError: DLPack buffer's data type must have at most 4 lanes

When use np.float32 is ok

        cv_tensor = cvcuda.convertto(cv_tensor, np.float16, scale=1.0 / 255.0, stream=cls.stream)

get error

ValueError: Casting nvcv::DataType from PyObject failed: Unable to cast Python instance of type <class 'type'> to C++ type '?' (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)

Expected behavior A clear and concise description of what you expected to happen.

Environment overview (please complete the following information)

Environment location: [Bare-metal, Docker, Cloud(specify cloud provider)]
Method of cuDF install: [Docker, pip, or from source]
- If method of install is [Docker], provide docker pull & docker run commands used pip install cvcuda-cu12==0.15.0 Environment details Please run and paste the output of the cvcuda/print_env.sh script here, to gather any other relevant environment details

Additional context Add any other context about the problem here.

Jul 31 '25 08:07 631068264

hi @631068264 , was able to reproduce both these issues with the latest main branch. the 1st one RuntimeError: DLPack buffer's data type must have at most 4 lanes occurs here

switch (lanes)
    {
    case 1:
        pp.swizzle = nvcv::Swizzle::S_X000;
        break;
    case 2:
        pp.swizzle = nvcv::Swizzle::S_XY00;
        break;
    case 3:
        pp.swizzle = nvcv::Swizzle::S_XYZ0;
        break;
    case 4:
        pp.swizzle = nvcv::Swizzle::S_XYZW;
        break;
    default:
        throw std::runtime_error("DLPack buffer's data type must have at most 4 lanes");
    }

https://github.com/CVCUDA/CV-CUDA/blob/main/python/mod_nvcv/DLPackUtils.cpp#L268 need to check lanes for np.float16.

the 2nd one ValueError: Casting nvcv::DataType from PyObject failed: Unable to cast Python instance of type <class 'type'> to C++ type '?' (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details) is related to pybind11 auto casting from float16.

return py::reinterpret_borrow<py::object>(obj).cast<T>();

https://github.com/CVCUDA/CV-CUDA/blob/main/python/mod_nvcv/CAPI.cpp#L61

Can this issue be assigned to me for further debugging?

Oct 07 '25 00:10 sbcd90

Hi @631068264 and @sbcd90 , thank you for the interest in CV-CUDA!

We are investigating this issue and are tracking it. We do not currently have a timeline on when float16 will be resolved. In the meantime, are you able to use float32?

Do you have any information you can share on your application for using float16 inside CV-CUDA? Having additional context can help us prioritize our work.

Nov 14 '25 18:11 justincdavis