Peter Heywood

Results 157 comments of Peter Heywood

When using CMake 3.21 and nvhpc 21.7 with `nvc++` as the host compiler, rapidjson cannot be compiled due to use of `uint128`. This needs further investigation, as the section of...

#977 has been opened to split nvc++ host compiler support out from this issue, which is to make sure that nvcc installed via the nvhpc is viable for building FLAME...

With: + CMake 3.22 + NVCC 11.7 from nvhcp 22.7 + GCC 11.3 via (on a box I've setup the module files for): ``` module rm CUDA module load nvhpc...

`-isystem=/usr/local/cuda-11.8/include -isystem=/usr/local/cuda-11.8/include/../include` (or teh equivalent for diff cuda versions) is what is present in build comamnds for regular nvcc based builds, which implicitly suppresses the warnings. This comes from `src/CMakelists.txt`,...

CMake 3.18 doesn't configure successfulyl with nvhpc 22.7 install nvcc: ``` CMake Error at /home/ptheywood/.venvs/cmake-318/lib/python3.10/site-packages/cmake/data/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:165 (message): Could NOT find CUDAToolkit (missing: CUDAToolkit_INCLUDE_DIR CUDA_CUDART) (found version "11.7.64") ``` CMake 3.20 is...

NVHPC 20.11, which ships with CUDA 11.1 configures and builds flamegpu2 with CMAke 3.20 as well, this is the second oldest release, and the oldest which is readily installable on...

The root cause of these warnings has been fixed in a future CUB release by https://github.com/NVIDIA/cub/pull/582, but that won't be useful to us immediatley.

This would probably be beneficial to any scatter operation (if feasible). Uncoalesced writes are more expensive than uncoalesced reads, so in a 1:1 situation coalescing writes is preferable as long...

It may be worth ensuring that atleast one block is launched per SM to maximise performance. This may come as a side effect of choose smallest block size which achieves...

As an extra note worth considering in the future: Most GPU architecutres can have `32` or `64` active warps per SM, and therefore `1024` or `2048` threads per SM. Threads...