Changheon Lee comments

Repositories
Issues
Comments

Results 2 comments of


                                            Changheon Lee

trafficstars

Support W8A8 inference in vllm

I downloaded your vllm branch w8a8 but i faced this case of error. should i add int8LlamForCausalLM in smoothquant ?? ValueError: Model architectures ['Int8LlamaForCausalLM'] are not supported for now. Supported...

Support W8A8 inference in vllm

Thanks for your answer. And did you apply partial quantization which mean that down_proj layer remain as a fp16 because of big activation range. as you know there is comment...