Peter Heywood comments

Results 157 comments of


                                            Peter Heywood

Can't build with Nvidia HPC SDK

When using CMake 3.21 and nvhpc 21.7 with `nvc++` as the host compiler, rapidjson cannot be compiled due to use of `uint128`. This needs further investigation, as the section of...

Can't build with Nvidia HPC SDK

#977 has been opened to split nvc++ host compiler support out from this issue, which is to make sure that nvcc installed via the nvhpc is viable for building FLAME...

Can't build with Nvidia HPC SDK

With: + CMake 3.22 + NVCC 11.7 from nvhcp 22.7 + GCC 11.3 via (on a box I've setup the module files for): ``` module rm CUDA module load nvhpc...

Can't build with Nvidia HPC SDK

`-isystem=/usr/local/cuda-11.8/include -isystem=/usr/local/cuda-11.8/include/../include` (or teh equivalent for diff cuda versions) is what is present in build comamnds for regular nvcc based builds, which implicitly suppresses the warnings. This comes from `src/CMakelists.txt`,...

Can't build with Nvidia HPC SDK

CMake 3.18 doesn't configure successfulyl with nvhpc 22.7 install nvcc: ``` CMake Error at /home/ptheywood/.venvs/cmake-318/lib/python3.10/site-packages/cmake/data/share/cmake-3.18/Modules/FindPackageHandleStandardArgs.cmake:165 (message): Could NOT find CUDAToolkit (missing: CUDAToolkit_INCLUDE_DIR CUDA_CUDART) (found version "11.7.64") ``` CMake 3.20 is...

Can't build with Nvidia HPC SDK

NVHPC 20.11, which ships with CUDA 11.1 configures and builds flamegpu2 with CMAke 3.20 as well, this is the second oldest release, and the oldest which is readily installable on...

Can't build with Nvidia HPC SDK

The root cause of these warnings has been fixed in a future CUB release by https://github.com/NVIDIA/cub/pull/582, but that won't be useful to us immediatley.

Inverted Device Scatter

This would probably be beneficial to any scatter operation (if feasible). Uncoalesced writes are more expensive than uncoalesced reads, so in a 1:1 situation coalescing writes is preferable as long...

Deciding optimal block size

It may be worth ensuring that atleast one block is launched per SM to maximise performance. This may come as a side effect of choose smallest block size which achieves...

Deciding optimal block size

As an extra note worth considering in the future: Most GPU architecutres can have `32` or `64` active warps per SM, and therefore `1024` or `2048` threads per SM. Threads...