Marcin Gajzler
Marcin Gajzler
AFAIK, BF16 is not supported/accelerated by AVX512_VNNI on 3rd Gen Intel Xeon (codename: Ice Lake); VNNI supports INT8 only on 3rd Gen. Please confirm, does ["--quant-with-amp" parameter](https://github.com/intel/intel-extension-for-pytorch/tree/v2.2.0%2Bcpu/examples/cpu/inference/python/llm#:~:text=enable%20quantization%20with%20Automatic%20Mixed%20Precision%20inference%20(non%2Dquantized%20OPs%20run%20with%20bf16%20dtype%2C%20which%20may%20affect%20the%20accuracy).) requires AMX? (which...
Let's skip single instance scenarios for a while, as "distributed" inference via DeepSpeed is more interesting for dual-socket systems. On a virtualized Linux VM (details below) on top of VMware...