Balint Joo

Results 9 issues of Balint Joo

Propagate changes to feature/moderner-cmake2 -> QUDAVersion.cmake -> QUDAConfigVersion.cmake -> Bump QMP version to 2.5.3 (Both download source and `find-package` -> Cherry pick QDP-JIT mods into a branch off GK and...

Currently `struct Aggregates` in `restrictor.cu` and `struct OrthoAggregates` in `block_orthogonalize.cu` are essentially identical. They could be unified using a Tag template. E.g. in `include/powers_of_two_array.h` ``` template struct TaggedAggregate { /*...

clean-up
Target_HIP

There appears to be a suspicious generator statement in `lib/CMakeLists.txt` in [feature/generic_kernel] which looks like so: ```bash # set up QUDA compile options target_compile_definitions( quda PRIVATE $ $ $ $...

bug
build

HIP-CPU supplies ``` ./share/hip_cpu_rt/cmake/hip_cpu_rtConfig.cmake ./share/hip_cpu_rt/cmake/hip_cpu_rtTargets.cmake ./share/hip_cpu_rt/cmake/hip_cpu_rtConfigVersion.cmake ``` but some ecosystem libraries (e.g. hipCUB) look for `FindHIP.cmake`, `HIPConfig.cmake` or `hip-config.cmake`. Can one do a direct symlink of e.g. `hip_cpu_rtConfig.cmake` to `hip-config.cmake`...

enhancement
question

Ctest test 23: `legacy_async_memcpy` appears to hang on MacOS when built with gcc-10 from homebrew -- MacOS Catalina (10.15.7) - similar behaviour is seen on MacOS CI where this test...

question

Hi All, Here are the additions for vector lane permute. Intrinsics for AVX (Single Prec), AVX512 (single prec and double prec), Generic for other CPUs, __shfl_sync for CUDA, __shfl for...

# A patch to allow setting the pool size for cudaMallocAsync ## Implementation The first time cudaAsyncMalloc is called (in void* impl_alloc_common() in Kokkos_CudaSpace.cpp) we check the environment variable KOKKOS_CUDA_MEMPOOL_SIZE...

Mods to allow copy-in of full (both parities) of QDP-JIT fields. This should allow the use of unpreconditioned actions from Chroma (allowing QUDA to take care of source creation and...

It would be helpful (although not necessary for performance or other reasons) it QUDA could show some 'sign-of-life' while tuning (e.g. the unix. / - \ | / - \...

feature