Alexander Matveev

Results 12 comments of Alexander Matveev

@davidgxue We have initial correctness on 8bit marlin, will do some perf checks and more testing and will put PR in a couple of days.

@davidgxue here we add 8-bit support https://github.com/vllm-project/vllm/pull/4533

Benchmark results on A100 for Yi-34B Chat model that has marlin_24 serialized weights (where the actual weight values are not real yet). This is just to show preliminary results to...

@pcmoritz This is good idea. Changed the API to return str or None and moved the gptq specific override logic to the override funcs.

Cool, fixed the nit and some other little things.

@bnellnm could you do a quick pass on the template changes.

@jinzhen-lin I think your code is in good state to land after addressing last comments.

@jinzhen-lin thanks for adding the tests and fixing all comments. @robertgshaw2-neuralmagic looks good to me to proceed forward.