Hong Bo PENG
Hong Bo PENG
@ggerganov would you please review this PR or assign reviewers? I am working for IBM and we would like to have this optimization on ppc64le. Thank you.
@ggerganov is this your preferred place to make PR for enhancement or optimization. Or shall I make PR in [llama.cpp](https://github.com/ggerganov/llama.cpp) and then you sync to this repository?
Here is the speedup of vec_dot_q float32 throughput numbers compared to current master branch on RHEL9.2 (Power10 machine) by the `test-quantizer-perf -i 10000`. The code was compiled by gcc-12.2.1. |...
We use IBM OpenXL C/C++ 17.1.2 and OpenXL Fortran 17.1.2 on AIX. No failure. I also tried GCC 11.3.1 on RHEL9.2. Only 1 failure in ssvd.out (SBD, M=30, N=40, type=10,...
Yes, we started to look at the lapack-test failures on Power platform. We worked on Power7 firstly because that is old and may not impact too much. You are correct...
As @RajalakshmiSR suggested, I restored the Makefile.power for P6. Now only change the KERNEL for Power6.
@martin-frbg Sorry for the confusion. The lapack testing on Power7 may not be "minor failure" because one of the failure is about 0.839E+07. I think some of the EigenValue outputs...
Close this PR and rework on GEMV/GEMM assembly kernels.