Ahsan Saghir
Ahsan Saghir
Working on higher priority DLM issues and JIRA tickets so could not work on this one.
> Working on higher priority DLM issues and JIRA tickets so could not work on this one. Same as before, did not get a chance to work on this. Now...
I was able to work on this a bit during the last sprint and was able to get past the error I was seeing for verification. I will continue to...
Was able to work on this last week: - Added tests covering the following operators: reduce_max, reduce_sum, avgpool, maxpool, sine, cosine, sqrt, division, multiplication. - Addressed comments to modify existing...
PR to update parse gemm and update gemm tests so I could address the [comments](https://github.com/ROCm/AMDMIGraphX/pull/3041#discussion_r1592515370) on gemm_fp8 test: https://github.com/ROCm/AMDMIGraphX/pull/3158
Talked to Jimmy from the hipBLASLt team for issue seen with hipBLASLt not finding the solution for FP8 inputs. ``` There are only few solutions for TT F8 input,F8output. Those...
Did some performance runs for bert_base_cased_1 to determine performance of rocblas vs hipblaslt: [Performance_hipBLASLt_bert_base_cased.xlsx](https://github.com/user-attachments/files/16472959/Performance_hipBLASLt_bert_base_cased.xlsx) ~The numbers are quite similar with the worst degradation for batch size 1 at 8%~. Edit:...
Looking at `make check` failures for hipblaslt branch... ``` The following tests FAILED: 94 - test_gpu_gemm_tune (Failed) 348 - test_verify_general (Failed) 350 - test_verify_conv (Failed) 351 - test_verify_gemm (Failed) ```...
Addressing comments on the PR.
Did some more performance runs for bert_base_cased_1 to determine performance of rocblas vs hipblaslt for FP32 and FP16. FP16 numbers show a perf improvement of ~19% to 47% depending on...