Danial Javady
Danial Javady
**Describe the bug** It takes around 6-7 minutes before `./build.sh libcudf tests` will begin building targets. There is a lot of preparation work that is done. This can be unproductive...
On my 3080 **BEFORE** line: `int n = npq_offset / (p_ * q_);` translates to [before_first_line_sass.txt](https://github.com/NVIDIA/cutlass/files/14826990/before_first_line_sass.txt) line: `int residual = npq_offset % (p_ * q_);` translates to [before_second_line_sass.txt](https://github.com/NVIDIA/cutlass/files/14826999/before_second_line_sass.txt) (i'll omit...
https://github.com/NVIDIA/cutlass/blob/main/include/cutlass/epilogue/threadblock/predicated_tile_iterator_strided_dgrad.h#L315-L318 This piece of code can be replaced by using fast divmod. The same can be applied to the store function below.
Fixes #121965 This PR hopes to add support complex numbers in the scatter/gather related kernels. For brevity, I will only include `complex` for now as `complex`, for example, will be...
Fixes #117122 This PR adds the logic so that in the case of rank deficient matrices, it can fallback to an SVD backend for batched mode. A big thank you...
A lot of APIs that are currently being used in the dnn module have been removed in cudnn 9. They were deprecated in 8. This PR updates said code accordingly...
while trying to understand thrusts `complex` i noticed a bunch of useless outdated macros that can be removed
Fixes #111824 Currently it is the case that if the user specifies their group normalization to be of NHWC format, pytorch will default to NCHW tensors and convert. This conversion...
Summary: 1) `insert` and `contains` only functions added for now 2) Put the data structure in a temporary `experimental` namespace to avoid having to change more areas of the code
Attempting to build CCCL without sccache installed will lead to `Notice: sccache is not available. Skipping...` . I personally run into troubles with devcontainers which is how I discovered this...