Confused about quality target
Hey guys,
I have some questions about the quality target. Based on my understanding, taking GPT-J text summarization as an example, the result of fp32 is (rouge1=42.9865, rouge2=20.1235, rougeL=29.9881), while other precisions including fp16 and int8 must achieve 99% of fp32? Why is it 99%? Is there any reference basis for this standard?
Looking forward to your response.
This 99% is used as the same value is used for the older benchmarks. Also, there is a higher 99.9% variant and both targets were achieved with lower precision models.
Do you see any problem with this target requirement for GPT-J?
I just cannot figure out why 99% and 99.9% of FP32 are the quality target. 😂
AFAIK they are just a chosen metric and supposed to be representative of a "good accuracy". But in case of some models like stable diffusion for text to image generation this accuracy metric is different
WG agrees to close