AMDMIGraphX issues

Add a compile pass for hipblaslt, similar to how miopen does it.

ahsan-ca

Improve driver verify through bisection

richagadgil

Change auto contiguous to always insert contiguous

3

Reopened to allow runs in CI

kahmed10

enhancement

Use GPU intrinsics and HIP types for FP8 for MIGX JIT kernels

3

FP8 header is now available from HIP so we should be using those types and the provided intrinsic.

CharlieL7

enhancement

FP8

[Issue]: accuracy_checker.py is not within tolerance for bert_large_uncased_1_fp16_gpu.onnx

### Problem Description I'm running (MIGraphX develop branch): ```bash python3 ../tools/accuracy/accuracy_checker.py --onnx ~/mlir-dev/AMDMIGraphX/build/bert_large_uncased_1_fp16_gpu.onnx --fill1 --input-dim input_ids:1,384 --disable-fast-math --tolerance 0.02 --verbose ``` And I get the error: ``` Max Difference: 0.13617822527885437...

dhernandez0

Optimize docker image building in Jenkinsfile

4

### Notes - Restarting from a stage is impossible, since we have parallel stages, not sequential ones. Alternatively, we can make a RUN_STAGE_NAME parameter for each stage so that unnecessary...

NISHIY-EKSDEE

high priority

Continous Integration

Improve `simplify_algebra` to find more horizontal fusion opportunities

In SD clip, there is an opportunity to fuse all the add kernels: ``` @15 = gpu::code_object[code_object=7632,symbol_name=mlir_dot_add,global=133632,local=256,](@13,@12,@5,@14) -> half_type, {24, 77, 2304}, {177408, 2304, 1}: 0.0934304ms, 2% @16 = hip::hip_copy_literal[id=main:@literal:78]...

kahmed10

AMDMIGraphX
AMDMIGraphX copied to clipboard

Metadata

Add llama2 with KV-cache example

Add llama2 with KV-cache to DLM

Compare llama2 performance with/without KV-cache

Add compile pass for hipblaslt

Improve driver verify through bisection

Change auto contiguous to always insert contiguous

Use GPU intrinsics and HIP types for FP8 for MIGX JIT kernels

[Issue]: accuracy_checker.py is not within tolerance for bert_large_uncased_1_fp16_gpu.onnx

Optimize docker image building in Jenkinsfile

Improve `simplify_algebra` to find more horizontal fusion opportunities

← Metadata

Owner

Metadata

AMDMIGraphX AMDMIGraphX copied to clipboard

Metadata

← Metadata

Owner

Metadata

AMDMIGraphX
AMDMIGraphX copied to clipboard