Daine McNiven comments

Results 22 comments of


                                            Daine McNiven

question about mixed precision dot

Hi again @jinz2014, [hipblasDotEx(...)](https://rocm.docs.amd.com/projects/hipBLAS/en/docs-6.2.0/functions.html#hipblasdotex-batched-stridedbatched) should support mixed-precision dot with 32-bit float input and 64-bit double output/compute with the rocBLAS backend now in ROCm 6.2. You can also take a look...

[Bug]: rocblas_gemm_ex with m==1 fp16 inputs/outputs f32 compute slower than a quite naive gemv kernel on MI100

Hi @Epliz, thanks for brining this up. Yes, the disparity between gemm with m == 1/n == 1 and gemv has been brought up in the past as noted by...

[Bug]: rocblas_gemm_ex with m==1 fp16 inputs/outputs f32 compute slower than a quite naive gemv kernel on MI100

Hi @Epliz and @IMbackK, sorry for the delay. Looking at my past notes, it looks like the areas of most concern were where the incx parameter is large (with various...

[Bug]: rocblas_gemm_ex with m==1 fp16 inputs/outputs f32 compute slower than a quite naive gemv kernel on MI100

Hi @IMbackK, Yes it's good to keep this topic up-to-date since it's been delayed for so long, thanks for your reminder. There have been no decisions made on a way...

[Bug]: rocblas_gemm_ex with m==1 fp16 inputs/outputs f32 compute slower than a quite naive gemv kernel on MI100

Hi @IMbackK, I have a pull request open which redirects some calls to rocblas_gemm_ex() to use our internal gemv kernels rather than gemm kernels from Tensile where I found our...

[Bug]: rocblas_gemm_ex with m==1 fp16 inputs/outputs f32 compute slower than a quite naive gemv kernel on MI100

Hi all, I'm sorry again for the extended delay in implementing this. The first-pass implementation has now been merged at https://github.com/ROCm/rocBLAS/commit/1ac1e23057a04ae280a85005f15bd8085bdd11ed and will be included in a future ROCm release....

Daine McNiven

question about mixed precision dot

[Bug]: rocblas_gemm_ex with m==1 fp16 inputs/outputs f32 compute slower than a quite naive gemv kernel on MI100

[Bug]: rocblas_gemm_ex with m==1 fp16 inputs/outputs f32 compute slower than a quite naive gemv kernel on MI100

[Bug]: rocblas_gemm_ex with m==1 fp16 inputs/outputs f32 compute slower than a quite naive gemv kernel on MI100

[Bug]: rocblas_gemm_ex with m==1 fp16 inputs/outputs f32 compute slower than a quite naive gemv kernel on MI100

[Bug]: rocblas_gemm_ex with m==1 fp16 inputs/outputs f32 compute slower than a quite naive gemv kernel on MI100

[Bug]: rocblas_gemm_ex with m==1 fp16 inputs/outputs f32 compute slower than a quite naive gemv kernel on MI100

Add ROCM-Examples

[Feature]: about 7900xtx benchmark.

[Issue]: hipBLAS install script does not write into /opt/rocm