RustaCUDA
RustaCUDA copied to clipboard
Add tuning option for shared memory size
Pascal and newer devices support a shared memory size larger than 48 KiB per thread group. This is an opt-in feature that was introduced in CUDA 9.0 by specifying the desired size using a launch function attribute.
The attributes necessary to opt-in are:
- CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES
- CU_FUNC_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT
This PR adds tuning support to RustaCUDA as in this example:
// Get the maximum shared memory size
let max_shared_mem_bytes = device.get_attribute(DeviceAttribute::MaxSharedMemoryPerBlockOptin)? as u32;
// Set the function attribute
let function_name = std::ffi::CString::new(...).unwrap();
let mut function = module.get_function(&function_name)?;
function.set_max_dynamic_shared_size_bytes(max_shared_mem_bytes)?;
// Launch the kernel
unsafe { launch!( function<<<grid, block, max_shared_mem_bytes, stream>>>()).unwrap() };
For more information, see the CUDA documentation and the Pascal tuning guide.