yaksa
yaksa copied to clipboard
Yaksa: High-performance Noncontiguous Data Management
A user reported being unable to build MPICH using `clang++` as the CUDA compiler used to build Yaksa pmodels/mpich#6954. Noting down the issues encountered trying to build like this: 1....
## Pull Request Description These flags were ignored when the user specified a compiler other than the nvcc included in the CUDA installation. Make sure to include them for consistency....
The error ``` Caught signal 11 (Segmentation fault: address not mapped to object at address 0x170) ==== backtrace (tid:1229674) ==== 0 /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libucs.so.0(ucs_debug_print_backtrace+0x33) [0x7fea6f92bcad] 1 /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libucs.so.0(ucs_handle_error+0x77) [0x7fea6f92ce0f] 2 /home/ac.jfaibussowitsch/petsc/arch-cuda-debug/lib/libucs.so.0(+0x37bca) [0x7fea6f92cbca] 3...
Coverity's static analysis does not like this pattern in the CUDA generated code. From `yaksuri_cudai_kernel_pack_SUM_hvector_hindexed_int16_t`. ``` [...] uintptr_t x3; for (intptr_t i = 0; i < md->u.hvector.child->u.hindexed.count; i++) { uintptr_t...
``` summary_junit_xml.test/pack/pack -datatype c_complex -count 17 -seed 5 -iters 32768 -segments 1 -ordering normal -overlap none -oplist complex | 3.9 sec | 4 summary_junit_xml.test/pack/pack -datatype c_double_complex -count 17 -seed 6...
Current backend defines all hooks as function pointer. Some hooks are accessed at fast-path or accessed multiple times in a single ipack/iunpack call. Compiler cannot optimize much for function pointers....
As yaksa is pulled as a submodule by mpich 3.4/3.4.1 it would be nice to have a yaska release that can be properly packaged by itself for distro (SUSE in...
GPU testing on Jenkins intermittently shows CUDA memory errors. For example, one of the nightly gpu test (https://jenkins-pmrs.cels.anl.gov/view/yaksa/job/yaksa-nightly-gpu/lastCompletedBuild/testReport/): ``` test/pack/pack -datatype int -count 17 -seed 73 -iters 32768 -segments 1...
Infohints has become essential to use yaksa effectively. Currently we have to check code for the usage. Need document the usage and the allowed set of hints.
We need to investigate and study the best strategy for performance tuning in the CUDA backend. One knob is the thread block size vs number of blocks.