Ajay Nayak
Ajay Nayak
Tested on: Titan RTX, cuda 11.0, Driver 455.51.05 The system has abundant RAM (>100G) and is a Intel Xeon processor I have been trying to run the tests provided in...
I am trying to run the [nodeSplitKernel](https://github.com/rapidsai/cuml/blob/branch-21.08/cpp/src/decisiontree/batched-levelalgo/kernels.cuh#L169) GPU function. I found the test suite [here](https://github.com/rapidsai/cuml/blob/branch-21.08/cpp/test/sg/decisiontree_batchedlevel_unittest.cu#L166). However, the given [test cases](https://github.com/rapidsai/cuml/blob/branch-21.08/cpp/test/sg/decisiontree_batchedlevel_unittest.cu#L219) make the kernel run for quite a short duration (<...
I am interested in exploring [CUDA device graph launch](https://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#device-launch). I was able to get through the documentation, but was unable to get any examples here. Can you add a sample...
I wanted to run the test ```scheduler_tests.py```. I believe, for a given trace, this test will give me the schedule in a file ```/tmp/simple.output```. The traces used seems to be...
Hi, After lowering a CUDA program to mlir format using cgeist, I wanted to run some analysis on memory operations. I want to differentiate memory operations which are 'volatile.' ```...