inference Confused about quality target

Hey guys,

I have some questions about the quality target. Based on my understanding, taking GPT-J text summarization as an example, the result of fp32 is (rouge1=42.9865, rouge2=20.1235, rougeL=29.9881), while other precisions including fp16 and int8 must achieve 99% of fp32? Why is it 99%? Is there any reference basis for this standard?

Looking forward to your response.

May 07 '24 06:05 sgwhat

This 99% is used as the same value is used for the older benchmarks. Also, there is a higher 99.9% variant and both targets were achieved with lower precision models.

Do you see any problem with this target requirement for GPT-J?

May 07 '24 11:05 arjunsuresh

I just cannot figure out why 99% and 99.9% of FP32 are the quality target. 😂

May 07 '24 13:05 sgwhat

AFAIK they are just a chosen metric and supposed to be representative of a "good accuracy". But in case of some models like stable diffusion for text to image generation this accuracy metric is different

May 07 '24 13:05 arjunsuresh

WG agrees to close

May 21 '24 16:05 mrmhodak