Balint Joo
Balint Joo
Propagate changes to feature/moderner-cmake2 -> QUDAVersion.cmake -> QUDAConfigVersion.cmake -> Bump QMP version to 2.5.3 (Both download source and `find-package` -> Cherry pick QDP-JIT mods into a branch off GK and...
Currently `struct Aggregates` in `restrictor.cu` and `struct OrthoAggregates` in `block_orthogonalize.cu` are essentially identical. They could be unified using a Tag template. E.g. in `include/powers_of_two_array.h` ``` template struct TaggedAggregate { /*...
There appears to be a suspicious generator statement in `lib/CMakeLists.txt` in [feature/generic_kernel] which looks like so: ```bash # set up QUDA compile options target_compile_definitions( quda PRIVATE $ $ $ $...
HIP-CPU supplies ``` ./share/hip_cpu_rt/cmake/hip_cpu_rtConfig.cmake ./share/hip_cpu_rt/cmake/hip_cpu_rtTargets.cmake ./share/hip_cpu_rt/cmake/hip_cpu_rtConfigVersion.cmake ``` but some ecosystem libraries (e.g. hipCUB) look for `FindHIP.cmake`, `HIPConfig.cmake` or `hip-config.cmake`. Can one do a direct symlink of e.g. `hip_cpu_rtConfig.cmake` to `hip-config.cmake`...
Ctest test 23: `legacy_async_memcpy` appears to hang on MacOS when built with gcc-10 from homebrew -- MacOS Catalina (10.15.7) - similar behaviour is seen on MacOS CI where this test...
Hi All, Here are the additions for vector lane permute. Intrinsics for AVX (Single Prec), AVX512 (single prec and double prec), Generic for other CPUs, __shfl_sync for CUDA, __shfl for...
# A patch to allow setting the pool size for cudaMallocAsync ## Implementation The first time cudaAsyncMalloc is called (in void* impl_alloc_common() in Kokkos_CudaSpace.cpp) we check the environment variable KOKKOS_CUDA_MEMPOOL_SIZE...
Mods to allow copy-in of full (both parities) of QDP-JIT fields. This should allow the use of unpreconditioned actions from Chroma (allowing QUDA to take care of source creation and...
It would be helpful (although not necessary for performance or other reasons) it QUDA could show some 'sign-of-life' while tuning (e.g. the unix. / - \ | / - \...