Wuwei Lin comments

Results 33 comments of


                                            Wuwei Lin

[Question] Omniquant. (AFAIK) scores best for Q. Methods, why no adoption? In any case, is per-tensor quant. best for Mixtral/MoE models?

per-tensor quantization that was added recently is for fp8, so far we have tested on mixtral and llama and more work such as calibration scale is in progress

[Model Request] Mixtral-8x22B-Instruct-v0.1 🙏

It’s supported. Are you requesting precompiled package?

[CUBLAS][FP8] Enable fusing astype operation for matmul multiply pattern

do we need to update cublas codegen or runtime to support the cast?

[Bug] Inconsistent Results between Direct Optimization and Sequential Optimization in TVM

If you call a pass directly (instead of using `Sequential`, it will bypass the check for `opt_level`, `required_pass`, etc.

[Codegen, CUDA] Enable emitting SyncWarp

I’ll send a new PR, but this might be a real issue.. in the past I already rerun CI multiple times

[Tracking Issue] [WebGPU] Supporting DP4A in WebGPU backend

@Jiawei-Shao in this case, we can do `sch.vectorize(ax1)` to convert the loop to a vectorized one. https://github.com/apache/tvm/blob/main/src/target/spirv/spirv_utils.cc#L123 will rewrite buffer with vectorized access to `int8x4` as long as both read...

Wuwei Lin

[Question] Omniquant. (AFAIK) scores best for Q. Methods, why no adoption? In any case, is per-tensor quant. best for Mixtral/MoE models?

[Model Request] Mixtral-8x22B-Instruct-v0.1 🙏

[CUBLAS][FP8] Enable fusing astype operation for matmul multiply pattern

[Bug] Inconsistent Results between Direct Optimization and Sequential Optimization in TVM

[Codegen, CUDA] Enable emitting SyncWarp

[Tracking Issue] [WebGPU] Supporting DP4A in WebGPU backend

[CI Problem] Windows Conda Build Does Not Support C++20 Features

Process is going to kill itself!

Process is going to kill itself!

Process is going to kill itself!