Vedant Roy
Vedant Roy
CuDNN version: `9.1.0`. nvidia-smi: ``` +---------------------------------------------------------------------------------------+ | NVIDIA-SMI 535.183.01 Driver Version: 535.183.01 CUDA Version: 12.2 | |-----------------------------------------+----------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | |...
[Archive.zip](https://github.com/user-attachments/files/16607253/Archive.zip) I've attached both the benchmarking and CuDNN wrapper code to this post. I suspect the benchmarking code is off, so I'll switch to something simpler (like the Pytorch profiler),...
Update: I tried to force everything to be float16, by taking the float value `ratio` that is an input to the kernel and storing it in a float16 tensor, but...
I don't see a `tl.full` method in the documentation and running: `python3 -c 'import triton.language as tl; tl.full()'` gives "module 'triton.language' has no attribute 'full'". Also, I'm not sure how...
> Please install triton master and retry. See `semantic.py` Sounds good. From looking at `semantic.py`, I'm guessing `full` creates a tensor filled with a given value. I see there's a...
Sounds good, once I can build master successfully, I will try it out & close this issue.