cubiomes icon indicating copy to clipboard operation
cubiomes copied to clipboard

WIP. Proof of concept for GPU accelerated genArea

Open hukumka opened this issue 4 years ago • 1 comments

Hello, and thanks for this awesome library.

This PR is a step toward #18 and implements generation of areas using opencl.

Lacking features

  • Layers past L_SHORE_16
  • Version support

Performance

Then generating 64 seeds per routine, I observed x30 speedup. Then generating 1 seed per routine, speedup is only x5.

Terribly sorry for dumping such a large chunk of code in a single PR, but I needed to see if my approach for avoiding recomputing same layer multiple times works before I submitted this.

hukumka avatar Sep 30 '20 13:09 hukumka

Thanks for the interest, I was always a little sceptical about performance with a GPU. Generating giant areas in one go might work reasonably well on a GPU, but the code is highly reliant on branching, which is like poison to a GPU and to SSE instructions. Also I find myself needing small areas much more often than large ones, which make this problem much worse. So I always leaned towards distributing workload on CPU cores instead. That said I'm quite interested to see what the performance would actually be using a GPU in different scenarios.

While checking out the your branch I found a bug in the cubiomes library that caused allocCache to allocate too little memory, when the entry point was one of the first few layers. That should be fixed now.

I found a couple of issues with the draft. I think at ocl_test.c:47 it should be bufferA[i + j*W] without the + s*W*H, and it does not seem to work for area sizes below 32x32.

Cubitect avatar Oct 02 '20 21:10 Cubitect