AMDMIGraphX
AMDMIGraphX copied to clipboard
Keep LayerNorm accumulator at FP32
When a model is quantized to FP16 LayerNorm is also quantized. This leads to an accuracy problem. Make the code changes needed to hold LayerNorm as always FP32 accumulation. Then test the SDXL model
Something similar to $ migraphx-driver compile stable-diffusion-xl-1.0-tensorrt/sdxl-1.0-base/unetxl/model.onnx --input-dim @sample 2 4 128 128 @timestep 1 @encoder_hidden_states 2 77 2048 --fp16 --exhaustive-tune -o unet_base16.mxr $ migraphx-driver perf unet_base16.mxr
Then verify accuracy using the txt2img.py script for SDXL.
https://github.com/ROCm/AMDMIGraphX/tree/sdxl_perf/examples/diffusion/python_stable_diffusion_xl