cccpr

Results 67 comments of cccpr

@rainkert Hi, have you tested any performance for deepseek-v3-w4a16-marlin? Can you share that?

@Jokeren `triton version 3.0.0` `os.environ["TRITON_INTERPRET"] = "0" ` or `os.environ["TRITON_INTERPRET"] = "1"` The following code gives me different behaviors given triton_interpret=0 or triton_interpret=1. ``` import os os.environ["TRITON_INTERPRET"] = "1" import...

@Jokeren different results You can give it a try.

@Jokeren I simplify the codes: with interpret: ``` tensor([[0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], [0, 0, 0, ..., 0, 0, 0], ...,...

how to use triton/main? like this? @Jokeren ``` git clone https://github.com/triton-lang/triton.git; cd triton; pip install ninja cmake wheel pybind11; # build-time dependencies pip install -e python ```

@Jokeren With or without interpret, **will the running speed vary a lot?** **I am using RTX 4060 to run this, w/o interpret is quite fast, but with interpret the code...

In this w8a8, is the **a8** dynamic-per-token-per-group quantized, the same as the original fp8 quantization grandularity? @laixinn