rasmith
rasmith
I am getting a runtime error when trying to run on a MacBook Pro with just the CPU. It ends up being something like this: ``` /opt/anaconda3/envs/relighting_video_capture/lib/python3.7/site-packages/torch/utils/checkpoint.py:25: UserWarning: None of...
This PR changes the the get_amd_offload_arch_flag() function to match all offload-arch types that have alphanumeric names. For example, on MI250, the offload-arch is gfx90a. On the MI250a, the function did...
This PR adds awq_gemm_triton which is a Triton implementation of awq_gemm, which is implemented in CUDA, as a fallback for when awq_gemm cannot be used. PR created after discussing with...
This PR adds awq_dequantize_triton, an implementation of awq_dequantize, which is already implemented in CUDA, as a fallback for when the CUDA implementation cannot be used. PR created per request of...
The following benchmark can be used to benchmark awq_triton kernels. Works if CUDA available or of only HIP is available. Will compare against CUDA implementation of AWQ, if CUDA AWQ...