Matthew Nicely comments

Results 113 comments of


                                            Matthew Nicely

[DOC] Where does cutlass’ detailed GEMM kernel?

There is a generic `__global__` kernel used [here](https://github.com/NVIDIA/cutlass/blob/master/include/cutlass/device_kernel.h). The type definition passed as its argument is unique for all the different operations. All the kernel arguments are packed into a...

[QST] Profiling difference between GemmUinversal and Gemm?

@qingyunqu were you able to determine the issue?

Question concerning cuTENSOR integration

Thanks @lebedov for the update. If there's anything we (NVIDIA) can do to help please don't hesitate to ask :smile:

"build.sh" does not automatically find GPU on a Jetson and compiles for all architectures [BUG]

@znmeb Do you mind setting `export CUDA_VISIBLE_DEVICES=0` and rerunning *build.sh*?

[BUG] OutOfMemoryError in spectrogram

I much easier workaround would be to allocate with CuPy's Managed Memory allocator (https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.ManagedMemory.html#cupy.cuda.ManagedMemory & https://docs.cupy.dev/en/stable/reference/generated/cupy.cuda.malloc_managed.html) This will allow the driver to migrate data back-and-forth between System and Device memory...

New(ish) CuSPARSE Semiring Multiplication Ops

What would be the SDDMM use cases?

[BUG] CUDA Error CUresult.CUDA_ERROR_ILLEGAL_ADDRESS when using cutlass_tensorop_s1688tf32gemm op

@rkindi has your issue been solved?

[QST] set different stages has different accuracy

@yuxgis did you figure out your issue?

[DOC]Require detailed description of `xxxThreadMap` and `xxxTileIterator`

@zhanggefan were your questions resolved with @hwu36's response?

[BUG] CUTLASS conflict with EGL header file

No CUTLASS bug, fixed in latest CUDA