lettuce Write a custom PyTorch extension for Stream+Collide

The computational bottleneck in LBM simulations is memory access. The current code framework handles that very naively. To boost the performance, the collision and streaming routines can be combined, like for example in the esoteric twist method. This would require to step beyond the standard routines offered by PyTorch and write custom C++/CUDA PyTorch extensions: https://pytorch.org/tutorials/advanced/cpp_extension.html

Sep 23 '20 09:09 Olllom

The current performance strongly depends on the number of steps. The following figure shows a benchmark for a TGV2D/3D on a V100. The performance decreases strongly if more than 8-10 steps are executed.

Sep 23 '20 10:09 McBs

Here are some additional data. As expected, the performance of the RTX 2060 Super (8Gb) is lower than the V100. However, the performance collapse can also be seen here. This behavior cannot be seen for the CPU (see figure) Benchmark

Sep 23 '20 11:09 McBs