cubiomes
cubiomes copied to clipboard
WIP. Proof of concept for GPU accelerated genArea
Hello, and thanks for this awesome library.
This PR is a step toward #18 and implements generation of areas using opencl.
Lacking features
- Layers past L_SHORE_16
- Version support
Performance
Then generating 64 seeds per routine, I observed x30 speedup. Then generating 1 seed per routine, speedup is only x5.
Terribly sorry for dumping such a large chunk of code in a single PR, but I needed to see if my approach for avoiding recomputing same layer multiple times works before I submitted this.
Thanks for the interest, I was always a little sceptical about performance with a GPU. Generating giant areas in one go might work reasonably well on a GPU, but the code is highly reliant on branching, which is like poison to a GPU and to SSE instructions. Also I find myself needing small areas much more often than large ones, which make this problem much worse. So I always leaned towards distributing workload on CPU cores instead. That said I'm quite interested to see what the performance would actually be using a GPU in different scenarios.
While checking out the your branch I found a bug in the cubiomes library that caused allocCache
to allocate too little memory, when the entry point was one of the first few layers. That should be fixed now.
I found a couple of issues with the draft. I think at ocl_test.c:47
it should be bufferA[i + j*W]
without the + s*W*H
, and it does not seem to work for area sizes below 32x32.