Sivagnanam Namasivayamurthy
Sivagnanam Namasivayamurthy
Did anyone find a solution to this problem?
@CNugteren Thank you for your time. I'm getting memory error while building the latest master with -DTUNERS=ON & -DCLIENTS=ON. ``` root@linaro:/home/linaro/CLBlast/build# make -- Building CLBlast with OpenCL API (default) --...
@CNugteren > First test performance with the latest master branch for reference, e.g. ./clblast_client_xgemm -m 256 -n 256 -k 256 -num_steps 4 -step 256 ``` | | | | m;...
@wang-jinwei Thank you very much for sharing the GFLOPS details. | GPU | Theoritical GFLOPS | CLBlast | | ------------- | ------------- | ------------- | | Adreno 330 | 166.5...
Using the current adreno_tryout branch, I tried to tune GEMM for a custom MNK size(results shown below): The best tuner configuration gave 115 GFLOPS at 1.51ms. ``` ./clblast_tuner_xgemm -m 16...
@CNugteren For your reference, I found a sample GEMM example from Qualcomm official Adreno SDK ([SDK link](https://developer.qualcomm.com/software/adreno-gpu-sdk/tools)). [AdrenoExampleKernels.zip](https://github.com/CNugteren/CLBlast/files/1586935/AdrenoExampleKernels.zip)
@CNugteren @wang-jinwei I came across this ([moskewcz/boda#13](https://github.com/moskewcz/boda/issues/13)) GEMM implementation that gave 70 GFLOPS on Adreno 530. ``` typedef unsigned uint32_t; __constant uint32_t const U32_MAX = 0xffffffff; typedef int int32_t; #72...
> "GEMM tuner ==> 115 GFLOPS > GEMM client ==> 0.1 GFLOPS" > Can you share code of GEMM where you can achieve 115GFLOPS? @roserg I got those results while...
@roserg I'm using Adreno 330 device with Linaro OS for my experiments. I couldn't use Snapdragon profiler with my device (no Android OS). All the matrices that I'm dealing with...
@roserg 578 MHz, but as @CNugteren suggested I wouldn't trust on 115 GFLOPS. The `adreno_tryout` branch isn't fully implemented, its still in trial & error stage. So we've to wait...