RustaCUDA icon indicating copy to clipboard operation
RustaCUDA copied to clipboard

Add tuning option for shared memory size

Open LutzCle opened this issue 3 years ago • 0 comments

Pascal and newer devices support a shared memory size larger than 48 KiB per thread group. This is an opt-in feature that was introduced in CUDA 9.0 by specifying the desired size using a launch function attribute.

The attributes necessary to opt-in are:

  • CU_FUNC_ATTRIBUTE_MAX_DYNAMIC_SHARED_SIZE_BYTES
  • CU_FUNC_ATTRIBUTE_PREFERRED_SHARED_MEMORY_CARVEOUT

This PR adds tuning support to RustaCUDA as in this example:

// Get the maximum shared memory size
let max_shared_mem_bytes = device.get_attribute(DeviceAttribute::MaxSharedMemoryPerBlockOptin)? as u32;

// Set the function attribute
let function_name = std::ffi::CString::new(...).unwrap();
let mut function = module.get_function(&function_name)?;
function.set_max_dynamic_shared_size_bytes(max_shared_mem_bytes)?;

// Launch the kernel
unsafe { launch!( function<<<grid, block, max_shared_mem_bytes, stream>>>()).unwrap() };

For more information, see the CUDA documentation and the Pascal tuning guide.

LutzCle avatar Oct 15 '21 20:10 LutzCle