SYCLomatic
SYCLomatic copied to clipboard
deviceProp.maxThreadsPerMultiProcessor != deviceProp.get_max_work_items_per_compute_unit() ?
deviceProp.maxThreadsPerMultiProcessor is 2048 and deviceProp.get_max_work_items_per_compute_unit() is 1024 on an NVIDIA GPU.
dpct version 16.0.0. Codebase:(536eeb8014b1570a8b65aee511cbe2ba664e3962)
cudaDeviceProp deviceProp;
cudaGetDeviceProperties(&deviceProp, 0);
const int mTpSM = deviceProp.maxThreadsPerMultiProcessor;
@zjin-lcf we plan to sync with compiler team what the root cause is.
okay
@zjin-lcf We have reported this issue to the compiler team and https://github.com/intel/llvm/issues/7997 also track its status, so there is no further action needed in SYCLomatic, I close this issue.