Rob Armstrong

Results 21 comments of Rob Armstrong

cudaDeviceSynchronize() is not a device-side API and wouldn't make sense for it to be called from within a device function - by definition, if the function is waiting for a...

The `cuda_driver_api.h` header was removed from CUDA Toolkit a long time back and its content was refactored into other headers - I'm not 100% certain which release offhand, but I...

I would ask in a forum relating to Ollama. CUDA hasn't provided that header in some time. They may have a similarly-named header, but I'm not able to speak to...

As a quick note with the update to CMake this should hopefully work better in the 12.8 release and newer - please let us know how it goes.

Given the age of this issue will close as resolved.

Given the age of this issue will close as resolved.

Sorry, should have been more clear. I'm doing a cleanup pass through the repo and took note of the issue to get it fixed, but since the last note was...

Hi @Zeyu-W, thanks for reporting this issue. I agree multiplying by K here is incorrect. But, SHMEM_STRIDE is already defined as `N * BLOCK_ROW_TILES`, I think also multiplying by M...

Thanks for your reply - let me take another look at it. This isn't originally my code so I may have misread it when I was initially looking at it.