Charlie Lin comments

Results 35 comments of


                                            Charlie Lin

Preload tiles into LDS to improve performance of pointwise transposes

Seems to be failing `test_spacetodepth_example_cpu` from ONNX backend. The test says CPU but the compiled program looks to be using the GPU...

Use GPU intrinsics and HIP types for FP8 for MIGX JIT kernels

No updates right now. In our long backlog right now.

Dangling quantizelinear from horizontal fusion, BERT and DistilGPT2

Still seeing this dangling quantizelinear after FP8 OCP->FNUZ changes on MI300 but now it's merged with the elementwise kernels from the OCP->FNUZ conversion: ``` @26 = gpu::code_object[code_object=6592,symbol_name=quantizelinear_bit_cast_equal_where_equal_equal_logical_or_where│@207 = gpu::gemm[alpha=1,beta=0,compute_fp32=1,trans_batch=0,solution_idx=0](@204,@206,@205) ->...

Dangling quantizelinear from horizontal fusion, BERT and DistilGPT2

With how the current performance report for fp8 and int8 on MI300 look this is a marginal effect current compared to the time taken on fp8/int8 GEMMs. Would be better...

Horizontally Fuse Multiplication when horizontally fusing convolution (Yolov5, YoloX)

Here's a picture of the situation in ONNX before and the issue: ![horiz_fusion_issue](https://github.com/user-attachments/assets/2fcc1706-8ef6-42c2-b141-592d5c2de204) See this internal discussion for more elaboration: https://github.com/ROCm/AMDMIGraphX-internal/discussions/81

Horizontally Fuse Multiplication when horizontally fusing convolution (Yolov5, YoloX)

Resolved by https://github.com/ROCm/AMDMIGraphX/pull/3920.

Bitwise_and operator

> CI Hit a failure, not sure why it is not showing up.... > > [2024-07-30T21:21:39.312Z] [ RUN ] test_bitwise_andmigraphx::shape::bool_type [2024-07-30T21:21:39.312Z] [2024-07-30T21:21:39.312Z] module: "main" [2024-07-30T21:21:39.312Z] y = @param:y -> bool_type,...

Charlie Lin

Preload tiles into LDS to improve performance of pointwise transposes

Use GPU intrinsics and HIP types for FP8 for MIGX JIT kernels

Dangling quantizelinear from horizontal fusion, BERT and DistilGPT2

Dangling quantizelinear from horizontal fusion, BERT and DistilGPT2

Horizontally Fuse Multiplication when horizontally fusing convolution (Yolov5, YoloX)

Horizontally Fuse Multiplication when horizontally fusing convolution (Yolov5, YoloX)

Bitwise_and operator

Add ONNX parsing for SimplifiedLayerNormalization

Remove rocmlir unsupported reduce types

Adding MIGRAPHX_TIME_MATCHERS