Osayamen Aimuyo

Results 13 comments of Osayamen Aimuyo

Best to use templates. The below works fine. Of course, you can be flexible with how you instantiate your class, I only give one of many approaches below. Note that...

@thakkarV Closing the loop on #1631, let me know if this is not actually needed and I will close this instead.

If needed, I can also register float8 and half for `is_floating_point`

cuBLASdx 0.3.0 addresses this. See [here](https://github.com/NVIDIA/CUDALibrarySamples/issues/233) and see sample code in example/cublasdx from the official [release](https://developer.nvidia.com/cublasdx-downloads).

@yzhaiustc To motivate this feature, see the attached SASS source analysis for the current conversion on sm80. Note that the conversion that would take a single `cvt` instruction, currently takes...

Every other test works fine. Results are attached below. gdrcopy_sanity ```Bash Total: 28, Passed: 28, Failed: 0, Waived: 0 ``` gdrcopy_copybw ```Bash GPU id:0; name: Tesla V100-SXM2-32GB; Bus id: 0001:00:00...

Hey @pakmarkthub thanks for the quick response! I installed gdrcopy using the deb packages, so I am not using `make` to run the tests; I run the installed binary. That...

1. Running `gdrcopy_test` returns `gdrcopy_test: command not found` 2. Running the CUDA example you suggested works fine, see results below. I also want to mention, this may not be a...

Just in case you need these. nvcc --version ```Bash nvcc: NVIDIA (R) Cuda compiler driver Copyright (c) 2005-2024 NVIDIA Corporation Built on Wed_Apr_17_19:19:55_PDT_2024 Cuda compilation tools, release 12.5, V12.5.40 Build...