elephant icon indicating copy to clipboard operation
elephant copied to clipboard

[Fix] "too many resources requested for launch" error in the ASSET joint probability matrix computation when using CUDA

Open kohlerca opened this issue 6 months ago • 2 comments

The computation of the joint probability matrix involves two GPU-accelerated steps. The two steps are invoked using the private classes _PMatNeighbors (first step) and _JSFUniformOrderStat3D (second step). With recent versions of PyCUDA (> 2021.1), the unit test for _PMatNeighbors failed with error

pycuda._driver.LaunchError: cuLaunchKernel failed: too many resources requested for launch

The number of registers used by the kernel in _PMatNeighbors is determined at run time, depending on the n_largest and filter_size parameters. Recent CUDA versions may produce compiled code that uses a larger number of registers. As the current behavior of _PMatNeighbors when using CUDA GPU acceleration is to utilize the maximum number of threads per block available on the GPU (typically 1024), this may exceed the maximum number of registers available when running with the maximum 1024 threads, raising that error.

Therefore, this PR fixes the error by automatically determining the number of threads based on the number of registers in the compiled kernel.

To mirror functionality in _JSFUniformOrderStat3D, where the number of CUDA threads can be passed by a parameter to the ASSET function, additional functionality is implemented to allow overriding the number that was automatically determined. The cuda_threads parameter can also be passed as a tuple, where the second element will determine the number of threads used by _PMatNeighbors. If the usual single integer parameter value is passed, _PMatNeighbors will use the maximum number automatically determined.

kohlerca avatar Jun 17 '25 09:06 kohlerca

Coverage Status

coverage: 88.069% (-0.2%) from 88.303% when pulling c778140bcc4ce74761bfd25cd1038135f119416a on INM-6:fix/cuda_too_many_resources into febc7ce6b30ce35f3a72c40103f3f79a76547f62 on NeuralEnsemble:master.

coveralls avatar Jun 17 '25 09:06 coveralls

Implemented additional test cases to validate the computation of _PMatNeighbors when using distinct number of threads, and a new specific test class for the high-level API in ASSET.joint_probability_matrix.

kohlerca avatar Jun 17 '25 14:06 kohlerca