Keep LayerNorm accumulator at FP32

Open causten opened this issue 1 year ago • 0 comments

When a model is quantized to FP16 LayerNorm is also quantized. This leads to an accuracy problem. Make the code changes needed to hold LayerNorm as always FP32 accumulation. Then test the SDXL model

Something similar to $ migraphx-driver compile stable-diffusion-xl-1.0-tensorrt/sdxl-1.0-base/unetxl/model.onnx --input-dim @sample 2 4 128 128 @timestep 1 @encoder_hidden_states 2 77 2048 --fp16 --exhaustive-tune -o unet_base16.mxr $ migraphx-driver perf unet_base16.mxr

Then verify accuracy using the txt2img.py script for SDXL.
https://github.com/ROCm/AMDMIGraphX/tree/sdxl_perf/examples/diffusion/python_stable_diffusion_xl

Mar 20 '24 18:03 causten