neural-speed icon indicating copy to clipboard operation
neural-speed copied to clipboard

Add support for phi-3-mini-128k model

Open bil-ash opened this issue 1 year ago • 2 comments
trafficstars

Please add support for the phi-3-mini-128k(context length) model in neural-speed.

bil-ash avatar Apr 30 '24 00:04 bil-ash

It's in our plan. Thanks

kevinintel avatar May 01 '24 15:05 kevinintel

Thanks, since phi 3 support has been merged will close this issue. But I have another question and do not want to create a separate issue, so asking here.

According to https://github.com/intel/neural-speed/tree/main/neural_speed/core#fastest-configuration-for-cpus , for ISAs both newer and older(AVX2) than AVX512F, int8 is the fastest configuration, but for AVX512F fp32 is the fastest. Why is it so? Also, does int8 compute lead to lesser memory usage as compared to fp32 or is the memory usage equal for same type of quantization?

bil-ash avatar May 10 '24 02:05 bil-ash

@bil-ash Hi, AVX512F here means devices without AVX512_VNNI, and I don't implement u8s8 and s8s8 for AVX512. So it's better to use fp32 for computation. AVX2 devices without AVX_VNNI have u8s8 & s8s8 kernels for backup.

luoyu-intel avatar May 17 '24 02:05 luoyu-intel

Okay, understood

bil-ash avatar May 20 '24 02:05 bil-ash