Pradeep Garigipati
Pradeep Garigipati
@UmashankarTriforce Sorry for a late response. @UmashankarTriforce @rahul-ar Yes, join CUDA/OpenCL can still use this performance improvement.
@rahul-ar We have our [slack channel][1] for quick questions. But it would be better to iron out the approach/design here before going ahead with the implementation. Given that you are...
@willyborn Thank you, will have a look soon and let you know.
@willyborn Sorry about the delay on my end, I just returned to work after a long break. I see that you have already raised PRs #3144 #3145 . Do they...
@anates Sorry about the delay. What is the BLAS library backend you used to build ArrayFire's CPU backend ?
In case you are using IntelMKL, please go through the conversation in the issue - https://github.com/arrayfire/arrayfire/issues/3042 it might help to check if some MKL variable can be set to resolve...
Can you try setting [`MKL_NUM_THREADS`](https://software.intel.com/content/www/us/en/develop/documentation/mkl-linux-developer-guide/top/managing-performance-and-memory/improving-performance-with-threading/using-additional-threading-control/intel-mkl-specific-environment-variables-for-openmp-threading-control.html) and check if that changes the behavior.
> Can you try setting [`MKL_NUM_THREADS`](https://software.intel.com/content/www/us/en/develop/documentation/mkl-linux-developer-guide/top/managing-performance-and-memory/improving-performance-with-threading/using-additional-threading-control/intel-mkl-specific-environment-variables-for-openmp-threading-control.html) and check if that changes the behavior. @anates Any change ? or is it still using single thread ?
@anates If you have resolved the issue by any changes to you development setup, please share details here for any future users who may face the similar issue.
@willyborn I can't recollect why I avoided hashing JIT code, perhaps to avoid extra computation time. Fixed version solution might not work in a couple of corner cases such as...