neural-speed Add support for phi-3-mini-128k model

Please add support for the phi-3-mini-128k(context length) model in neural-speed.

Apr 30 '24 00:04 bil-ash

It's in our plan. Thanks

May 01 '24 15:05 kevinintel

Thanks, since phi 3 support has been merged will close this issue. But I have another question and do not want to create a separate issue, so asking here.

According to https://github.com/intel/neural-speed/tree/main/neural_speed/core#fastest-configuration-for-cpus , for ISAs both newer and older(AVX2) than AVX512F, int8 is the fastest configuration, but for AVX512F fp32 is the fastest. Why is it so? Also, does int8 compute lead to lesser memory usage as compared to fp32 or is the memory usage equal for same type of quantization?

May 10 '24 02:05 bil-ash

@bil-ash Hi, AVX512F here means devices without AVX512_VNNI, and I don't implement u8s8 and s8s8 for AVX512. So it's better to use fp32 for computation. AVX2 devices without AVX_VNNI have u8s8 & s8s8 kernels for backup.

May 17 '24 02:05 luoyu-intel

Okay, understood

May 20 '24 02:05 bil-ash