AMDMIGraphX
AMDMIGraphX copied to clipboard
AMD's graph optimization engine.
Add a compile pass for hipblaslt, similar to how miopen does it.
FP8 header is now available from HIP so we should be using those types and the provided intrinsic.
### Problem Description I'm running (MIGraphX develop branch): ```bash python3 ../tools/accuracy/accuracy_checker.py --onnx ~/mlir-dev/AMDMIGraphX/build/bert_large_uncased_1_fp16_gpu.onnx --fill1 --input-dim input_ids:1,384 --disable-fast-math --tolerance 0.02 --verbose ``` And I get the error: ``` Max Difference: 0.13617822527885437...
### Notes - Restarting from a stage is impossible, since we have parallel stages, not sequential ones. Alternatively, we can make a RUN_STAGE_NAME parameter for each stage so that unnecessary...
In SD clip, there is an opportunity to fuse all the add kernels: ``` @15 = gpu::code_object[code_object=7632,symbol_name=mlir_dot_add,global=133632,local=256,](@13,@12,@5,@14) -> half_type, {24, 77, 2304}, {177408, 2304, 1}: 0.0934304ms, 2% @16 = hip::hip_copy_literal[id=main:@literal:78]...