Results 4 comments of xiaonans

Hi @nv-dlasalle, Was your test result that "Avg epoch time: 6.580887830257415" got on DGX-A100?

> @xiaonans For group_size let's say 128, means every 128 4bit weights will have one fp16 zero_point. Memory ratio of zp / weight is 16 / (128 * 4). That's...

> are you using 2.x or 3.x API? in 3.x you should just be able to set your epilogue stride to whatever you want and it should just work Thanks...

> What is your data type and hardware? If fp16 or bf16, A can be any layout on ampere. My data type is fp16, and hardware is A100-80G. I want...