Peter Heywood comments

Results 157 comments of


                                            Peter Heywood

Re-introduce Clang host compiler support

Updated list of supported clang's per CUDA based on the linux release notes. Pre CUDA 12.2 a single value was listed, which may have been the upper bound, although it...

Re-introduce Clang host compiler support

Some clang CI builds now in a good state, others not so much for a mix of reasons: + stdlib errors probably from differences between the gcc stdlib not matching...

Re-introduce Clang host compiler support

CUDA 12+ with clang 11-14 (all that's easily installable on ubuntu 22.04) are all happy enough on CI now for vis and non vis builds. CUDA 11.x is unhappy for...

Re-introduce Clang host compiler support

clang 7 does not build, with nvcc reporting it does not understand c++17 (although clang itself claims 5+ does). Clang 8 and clang 9 without vis build on ci ok...

Python / NVRTC performance (CUDA 12.2+)

This looks like an nvrtc perf regression within CUDA 12.2. Using `python_rtc/boids_spatial3D_bounded/boids_spatial3D.py` with `-t -v -s 1`, purging the jitify cache between runs: | Wheel CUDA | loaded CUDA (.so's)...

Python / NVRTC performance (CUDA 12.2+)

CUDA 12.3 build with 12.3 at runtime had an RTC processing time of 20.773s, with driver 545.23.06, so its still painful but not quite as bad. With 545.23.06 and python...

Python / NVRTC performance (CUDA 12.2+)

Confirmed this is not hardware specific, running on a Titan V, compiled with CUDA 12.0 and driver 545.23.06 ```bash module load CUDA/12.0 cmake .. -DCMAKE_CUDA_ARCHITECTURES="70" -DFLAMEGPU_RTC_DISK_CACHE=OFF cmake --build . --target...

Python / NVRTC performance (CUDA 12.2+)

Google colab has now update to CUDA 12.2, which makes this issue more prominant to potential FLAME GPU 2 users, with the `run_simulation` cell now taking ~3-5 minutes for the...

Test suite compilation issues with multiple cuda architectures

@Robadob has encountered heap size related compialtion errors when building a debug configuration of the test suite under windows. Splitting the c++ test suite and orchestrating via ctest would be...

MultiThreadDeviceTest tempremental failures

The first test runs the same simualtion of 10k agents 3 times in separate std:threads. The models include an agent function reading an environment property 10k times. A 980m has...