Koyaanisqatsi copied to clipboard
some simple optimizations and swing image display speedup (setRGB is …
…synchronized so use the int[] raster, much faster)
Thanks for the effort. I would accept a small change around 'imgData'.
yes that is the most important part
btw it might be interesting to compare it with this
In my taxonomy, the aparapi approach falls into the "synchronous parallel" category. While GPU hardware parallelism provides significant performance boost the need to synchronize on a barrier between generations is still there and my point is that eventually it kills scalability. My implementation is really just a proof-of-concept. I'm not positioning it as a faster way to run Life (there are better approaches to that), and that's why I'm not interested in trading off core algorithm optimizations for even more complexity. However, from the scalability perspective it already looks promising (to demonstrate that one needs more CPUs/cores that a typical laptop has, though).
you may be right, im not sure if opencl fully supports asynch compute. in opencl this might be the closest possible: http://aparapi.com/documentation/explicit-buffer-handling.html
but a friend told me that Vulkan is designed for asynch cases. ill have to look into that
"Asynchronous" is an overloaded term. I explain what I mean by it in the context of Life implementations. In the GPU context, my expectation is that asynchronous computation means that submitting tasks to GPU is disentangled from their execution but the developer will have to synchronize tasks if there is data dependency, which means there will have to be a barrier between generations anyway. A more complicated implementation might be possible but the straightforward one won't benefit from asynchronicity.
I slightly changed aparapicellular to measure sustained fps when switching between GPU and CPU. On my Mac with 4 CPU cores and 384 GPU cores I'm getting ~90 fps in CPU mode and ~610 fps in GPU mode. Let's say that's approximately 610/(90/4) = 27 times faster than a single-threaded execution. I understand it's not exactly apples-to-apples but I observed speedup ~300x for my implementation on some serious hardware (256 cores + HT). It processed 1850 generations per second but the number of cells was 1.6 times less.
could you try running aparapi cellular in CPU mode (native OpenCL, not JVM) on the 256 core hardware?
i cant say for sure if my alife implementations there in aparapi cellular are anywhere near optimal since it was my first experiment with aparapi. so there are potentially better ways of doing it, as well as potential improvements in aparapi and drivers.
That was a headless server class machine, not even Intel and no GPU (but running Java).
it could probably still work if there exist cpu opencl drivers for the architecture
for example POCL has cpu-only support and looks relatively portable http://pocl.sourceforge.net/docs/html/install.html#requirements