gpu.js icon indicating copy to clipboard operation
gpu.js copied to clipboard

Is it possible to achieve arbitrary precision in my kernel?

Open psykovski-extended opened this issue 2 years ago • 2 comments

Currently, I am learning React.js, and to do so, I wanted to develop an Mandelbrot-Calculator for the browser with arbitrary precision. My current working kernel looks like this:

const recalculateMandelbrot = gpu.createKernel(function(x_start, x_end, y_start, y_end, iters){
    let c_re = x_start + (x_end - x_start) * this.thread.x / 1024;
    let c_im = y_start + (y_end - y_start) * this.thread.y / 1024;
    let z_re = 0, z_im = 0;
    let z_re_prev = 0;

    for(let i = 0; i < iters; i++) {
        z_re_prev = z_re;
        z_re = z_re * z_re - z_im * z_im + c_re;
        z_im = z_re_prev * z_im + z_re_prev * z_im + c_im;

        if ((z_re * z_re + z_im * z_im) >= 4) {
            return i;
        }
    }
    return iters;
}).setOutput([1024, 1024]).setPrecision('single');

And it works perfectly fine, until I reach 1e-7 at number scale. Down there, the outputed image gets blurry, indicating that float-precision reached its limitations. And in order to get down as far as the user want to, I need to achieve arbitrary precision. I know, that I could do this also on the cpu, but I would like to use the speed of the GPU, combined with the precision of big.js - this would be awesome! Thanks in advance for any good tips!

psykovski-extended avatar Jun 05 '22 19:06 psykovski-extended

internally the GPU works with mostly 32 bit hardware, so any way to increase precision will slow things down a lot (possibly to the point where the CPU actually gets faster again)

end-me-please avatar Nov 22 '22 10:11 end-me-please

You can use 24 bits of a float as an int24, or less if carrying digits during multiply or add etc. You can use 4 of those at once, which compiles to a vec4 in glsl or an array of Float32Array(4)s if output. You could derive bigger numbers from that. I've seen GPU.js do around a teraflop in browser, but only 20 gflops for simplest kind of matrix multiply cuz thats IO bottlenecked. These extra calculations would not add any extra IO between the GPU cores so shouldnt slow it down much.

benrayfield avatar Jan 21 '23 19:01 benrayfield