Matthew Nicely

Results 113 comments of Matthew Nicely

There is a generic `__global__` kernel used [here](https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/device_kernel.h). The type definition passed as its argument is unique for all the different operations. All the kernel arguments are packed into a...

@qingyunqu were you able to determine the issue?

Thanks @lebedov for the update. If there's anything we (NVIDIA) can do to help please don't hesitate to ask :smile:

@znmeb Do you mind setting `export CUDA_VISIBLE_DEVICES=0` and rerunning *build.sh*?

I much easier workaround would be to allocate with CuPy's Managed Memory allocator (https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.ManagedMemory.html#cupy.cuda.ManagedMemory & https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.malloc_managed.html) This will allow the driver to migrate data back-and-forth between System and Device memory...

What would be the SDDMM use cases?

@yuxgis did you figure out your issue?

@zhanggefan were your questions resolved with @hwu36's response?

No CUTLASS bug, fixed in latest CUDA