WebGL-City-SSAO icon indicating copy to clipboard operation
WebGL-City-SSAO copied to clipboard

nvidia compiler crashes libnvidia-glcore.so.337.25

Open manuelbua opened this issue 10 years ago • 33 comments

I'm using the gpg8 shader for my game and noticed the latest nvidia drivers hangs while compiling the shader source. I probably was able to pinpoint the problem, it looks like the inv_rot containing the inverse rotation matrix doesn't make the compiler happy and crashes without any error or warning messages. But also, for some odd reason, avoiding the sample_offset *= inv_rot operation does NOT change the result, what am i missing?

I'm reporting this here since i seen the webgl demo will not start as well as my game, do you have any idea on what could cause this issue? Just avoiding passing the inv_rot parameter will make things work fine again..?

manuelbua avatar Jun 14 '14 11:06 manuelbua

Can you isolate the shader that crashes your driver with and without the offending statement? Could you paste it here?

pyalot avatar Jun 15 '14 08:06 pyalot

Ok, i was able to have the demo running by commenting out this line in gpg8.shader, some errors will popup in the console, but the demo will runs fine. As soon as i uncomment that line, everything starts crashing, Firefox will close without notice, Chrome will offer me to restart WebGL, but then only black pages will load and i need to close and reopen the browser.

I don't get what's happening, i'm running on nVidia/ArchLinux 64bit, nVidia drivers 337.25, but previous drivers were fine (334.21).. i'm quite sure it's an internal GLSL compiler error, but i'm not sure how to report that to nVidia..

manuelbua avatar Jun 15 '14 12:06 manuelbua

You've got the machine that's crashing, so you're the only person currently able to isolate the behavior. Does it crash at shader compile or at draw? Can you take out more of the shader, and still retain the crashing behavior. How much can you take out?

The way to get to the bottom of it, is to reduce the shader to the barest minimum (it doesn't even have to do anything sensible) that still crashes. Once that's reduced, reduce the entire program to a minimal test case that crashes.

Once that is done, I can help with the proper course of reporting that error.

pyalot avatar Jun 15 '14 12:06 pyalot

Yes, that's exactly what i did: by starting in main, and commenting out as much as i could, the offending function is always testOcclusion at compile time and, inside it, i can comment out everything and returning arbitrary test values and it will run fine, but as soon as i let's the shader use the inv_rot uniform then everything crashes.

I can't see the compile stage crashing in the browser (Chromium doesn't crash but Firefox will close), but i'm using the same shader in my game and here i can see a Java crashdump happening in libnvidia-glcore.so.337.25.

manuelbua avatar Jun 15 '14 12:06 manuelbua

I can try to work out a minimal crashing example, but something tells me also the shader size is somewhat a trigger of the problem.. but let's see what i can came up with..

manuelbua avatar Jun 15 '14 12:06 manuelbua

Ok, this is the barely minimum fragment that will trigger the bug (from gpg8.shader):

vertex:
    attribute vec3 position;
    attribute vec2 texcoord;
    varying vec2 uv;

    void main(void) {
        gl_Position = vec4(position, 1.0);
        uv = texcoord;
    }

fragment:
    #define sample_count 16
    #define pattern_size 4.0

    uniform sampler2D normaldepth, random_field;
    uniform vec2 viewport;
    uniform float near, far, radius, epsilon, full_occlusion_treshold, no_occlusion_treshold, occlusion_power, random_size, power;
    uniform mat4 proj, inv_proj;
    uniform mat3 inv_rot;
    uniform vec3 samples[8];
    varying vec2 uv;

    float testOcclusion(vec3 eye_normal, vec3 eye_pos, vec3 sample_offset){
        sample_offset *= inv_rot;
        return 0.0;
    }

    void main(void){
        vec3 eye_ray = vec3(0.0,0.0,0.0);
        vec4 eye_data = vec4(1.0,1.0,1.0,0.0);
        vec3 eye_normal = vec3(0.0,0.0,0.0);
        float eye_depth = 0.0;
        vec3 eye_pos = eye_depth * eye_ray;
        float result = 0.0;

        for(int i=0; i<sample_count; i++){
            vec3 sample_offset = vec3(0,0,0);
            result += testOcclusion(eye_normal, eye_pos, sample_offset);
        }
        result /= float(sample_count);
        gl_FragColor = vec4(pow(1.0-result, power));
    }

Commenting out the line sample_offset *= inv_rot; in testOcclusion will make it compile fine instead.

manuelbua avatar Jun 15 '14 12:06 manuelbua

So this crashes at gl.compileShader (as opposed to linkProgram) right?

Can you take out these and it'll still crash?

  • #define pattern_size 4.0
  • uniform sampler2D normaldepth, random_field;
  • uniform vec2 viewport;
  • near, far, radius, epsilon, full_occlusion_treshold, no_occlusion_treshold, occlusion_power, random_size
  • uniform mat4 proj, inv_proj;
  • uniform vec3 samples[8];
  • varying vec2 uv;

pyalot avatar Jun 15 '14 13:06 pyalot

Uhm, now it gets interesting: i patched glee.js to log before/after gl.compileShader and gl.linkProgram, removed every other uniform/varying and, as opposed to my desktop app, it looks like it finishes compilation and linking, but then an error occurs:

compiling normaldepth/normal.shader shader.js:55
compiled normaldepth/normal.shader shader.js:57
linking normaldepth/normal.shader shader.js:68
linked normaldepth/normal.shader shader.js:70
compiling ssao/gpg8.shader shader.js:55
compiled ssao/gpg8.shader shader.js:57
compiling ssao/gpg8.shader shader.js:55
compiled ssao/gpg8.shader shader.js:57
linking ssao/gpg8.shader shader.js:68
linked ssao/gpg8.shader shader.js:70
Object {type: "program link", error: "", path: "ssao/gpg8.shader"} glee.js:31
Uncaught #<Object> glee.js:33
handleError glee.js:33
Shader.compile shader.js:73
glee.Shader shader.js:20
Glee.xhr.success resources.js:76
request.onreadystatechange resources.js:14

This is the updated minimal example:

vertex:
    attribute vec3 position;
    attribute vec2 texcoord;
    varying vec2 uv;

    void main(void) {
        gl_Position = vec4(position, 1.0);
        uv = texcoord;
    }

fragment:
    #define sample_count 16
    uniform mat3 inv_rot;

    float testOcclusion(vec3 eye_normal, vec3 eye_pos, vec3 sample_offset){
        sample_offset *= inv_rot;
        return 0.0;
    }

    void main(void){
        vec3 eye_ray = vec3(0.0,0.0,0.0);
        vec4 eye_data = vec4(1.0,1.0,1.0,0.0);
        vec3 eye_normal = vec3(0.0,0.0,0.0);
        float eye_depth = 0.0;
        vec3 eye_pos = eye_depth * eye_ray;
        float result = 0.0;

        for(int i=0; i<sample_count; i++){
            vec3 sample_offset = vec3(0,0,0);
            result += testOcclusion(eye_normal, eye_pos, sample_offset);
        }
        result /= float(sample_count);
        gl_FragColor = vec4(pow(1.0-result, 1.0));
    }

manuelbua avatar Jun 15 '14 13:06 manuelbua

So it still crashes, but just after it's thrown the link error?

Does it retain the crashing behavior if you do these?

  • strip the vertex shader down to void main(){ gl_Position = vec4(0.0); }
  • inline the body of testOcclusion into the loop
  • remove vec4 eye_data = vec4(1.0,1.0,1.0,0.0);
  • reduce sample count to below 16 (like 8, 4, 2 or 1)

pyalot avatar Jun 15 '14 13:06 pyalot

I'm going to test that as well, in the meantime i was also able to get Chromium to log the other shaders as well and, as soon as the gpg8.shader fails, every other following shader will fail as well:

linking ssao/gpg8.shader shader.js:68
linked ssao/gpg8.shader shader.js:70
Object {type: "program link", error: "", path: "ssao/gpg8.shader"} glee.js:31
    Uncaught #<Object> glee.js:33
    handleError glee.js:33
    Shader.compile shader.js:73
    glee.Shader shader.js:20
    Glee.xhr.success resources.js:76
    request.onreadystatechange resources.js:14
compiling ssao/gd.shader shader.js:55
compiled ssao/gd.shader shader.js:57
Object {type: "shader compile", error: "", path: "ssao/gd.shader"} glee.js:31
    Uncaught #<Object> glee.js:33
    handleError glee.js:33
    Shader.compile shader.js:60
    glee.Shader shader.js:20
    Glee.xhr.success resources.js:76
    request.onreadystatechange resources.js:14
compiling ssao/gp1.shader shader.js:55
compiled ssao/gp1.shader shader.js:57
Object {type: "shader compile", error: "", path: "ssao/gp1.shader"} glee.js:31
    Uncaught #<Object> glee.js:33
    handleError glee.js:33
    Shader.compile shader.js:60
    glee.Shader shader.js:20
    Glee.xhr.success resources.js:76
    request.onreadystatechange resources.js:14
...

manuelbua avatar Jun 15 '14 13:06 manuelbua

Ok so, simplifying the vertex shader, reducing sample count and removing eye_data didn't work, however inlining testOcclusion worked, so i also tried to pass the inv_rot to it as well, but that didn't work either:

vertex:
    attribute vec3 position;
    attribute vec2 texcoord;

    void main(void) {
        gl_Position = vec4(0.0);
    }

fragment:
    #define sample_count 1
    uniform mat3 inv_rot;

    //float testOcclusion(vec3 eye_normal, vec3 eye_pos, vec3 sample_offset){
        //sample_offset *= inv_rot;
        //return 0.0;
    //}

    float testOcclusion(vec3 eye_normal, vec3 eye_pos, vec3 sample_offset, mat3 invrot){
        sample_offset *= invrot;
        return 0.0;
    }

    void main(void){
        vec3 eye_ray = vec3(0.0,0.0,0.0);
        vec3 eye_normal = vec3(0.0,0.0,0.0);
        float eye_depth = 0.0;
        vec3 eye_pos = eye_depth * eye_ray;
        float result = 0.0;

        for(int i=0; i<sample_count; i++){
            vec3 sample_offset = vec3(0,0,0);

            // this crashes
            //result += testOcclusion(eye_normal, eye_pos, sample_offset);

            // this also crashes
            //result += testOcclusion(eye_normal, eye_pos, sample_offset, inv_rot);

            // this works
            sample_offset *= inv_rot;
            result += 0.0;
        }
        result /= float(sample_count);
        gl_FragColor = vec4(pow(1.0-result, 1.0));
    }

Also, this is what i can see sometimes, when the page doesn't go completely black http://imgur.com/zOHHTtl

manuelbua avatar Jun 15 '14 13:06 manuelbua

Oh well, i may have hit the culprit: at this point i was suspecting math operations were scrambled in some way by the compiler/linker, so what i did was to try avoid temporary values in shortcut operations, so:

from this:

sample_offset *= inv_rot;
sample_offset *= sign(dot(eye_normal, sample_offset));

to this:

sample_offset = sample_offset * inv_rot;
sample_offset = sample_offset * sign(dot(eye_normal, sample_offset));

..and indeed it works fine o_O

manuelbua avatar Jun 15 '14 14:06 manuelbua

Here is a completely unrelated example that will trigger the bug:

vertex:
    attribute vec3 position;
    attribute vec2 texcoord;

    void main(void) {
        gl_Position = vec4(0.0);
    }

fragment:
    uniform mat3 inv_rot;
    vec3 test = vec3(1.0,1.0,1.0);

    float thisCrashes() {
        test *= inv_rot;
        return 0.0;
    }

    float thisWorks() {
        test = test * inv_rot;
        return 0.0;
    }

    void main(void){
        gl_FragColor = vec4(thisCrashes());
    }

manuelbua avatar Jun 15 '14 14:06 manuelbua

So, a first attempt at defining the bug in a descriptive manner could be:

It looks like the GLSL compiler/linker hangs, crashes or produces unexpected output
leading to unexpected behavior in the application, whenever shortcut math operations
inside function bodies involve multiplying a vec3 from the left to a mat3: while the normal
form "vec3 = vec3 * mat3" works fine, the shortcut version "vec3 *= mat3" triggers the bug
consistently.

I wouldn't like to be so specific about the vector/matrix dimensions, but this is the only
case where i can reproduce the error consistently.

Does it sounds right to you? Thanks you much for your help!

manuelbua avatar Jun 15 '14 14:06 manuelbua

So does this one crash for you? http://codeflow.org/issues/driver-bug/nvidia-linux-337.25.html

pyalot avatar Jun 15 '14 15:06 pyalot

Yes, this crashes the entire browser to its knees as my other testcase unrelated to webgl-city: i found this page (http://nvidia-submit.custhelp.com/app/ask#) for bug reporting, is it the right one? Does nVidia have a bug bounty program for this type of stuff?

manuelbua avatar Jun 15 '14 15:06 manuelbua

I'd suggest you file the bug at nvidia and paste me the URL of it, and I'd file the bugs for chrome and firefox. Usually when the ticket guys see my name on the browser bug, they know what to do :)

Can you paste your machine details (OS/version, GPU/model, Driver/version, Uname, etc.).

pyalot avatar Jun 15 '14 15:06 pyalot

Ok fine, but is that aforementioned page i posted the right place to do it, or does a public bug tracker exist? Also let me know the browser bug links, so that i can endorse them!

manuelbua avatar Jun 15 '14 15:06 manuelbua

I don't know of a public bug tracker for nvidia.

Will paste the browser bugs here.

pyalot avatar Jun 15 '14 15:06 pyalot

Here's the link to the nVidia bug, you'll probably need to login btw..

I specified my hardware/software details, here they are:

I'm on ArchLinux/64bit, GPU is GTX660/2GB, driver is 337.25.

$ uname -a
Linux moon 3.14.6-1-ARCH #1 SMP PREEMPT Sun Jun 8 10:08:38 CEST 2014 x86_64 GNU/Linux

$ pacaur -Ss nvidia|grep installed
extra/libcl 1.1-3 [installed]
extra/libvdpau 0.7-1 [installed]
extra/nvidia 337.25-1 [installed]
extra/nvidia-libgl 337.25-1 [installed]
extra/nvidia-utils 337.25-1 [installed]
community/libxnvctrl 337.12-1 [installed]
community/nvidia-cg-toolkit 3.1-2 [installed]
multilib/lib32-libcl 1.1-1 [installed]
multilib/lib32-libvdpau 0.7-2 [installed]
multilib/lib32-nvidia-libgl 337.25-1 [installed]
multilib/lib32-nvidia-utils 337.25-1 [installed]

manuelbua avatar Jun 15 '14 15:06 manuelbua

Can you output it from that script?

echo "## Nvidia driver info (cat /proc/driver/nvidia/version) ##"
cat /proc/driver/nvidia/version

echo
echo "## GPU info (glxinfo | grep -i \"opengl renderer string\") ##"
glxinfo | grep -i "opengl renderer string"

echo
echo "## CPU info (cat /proc/cpuinfo | grep -i \"model name\" | head -1) ##"
cat /proc/cpuinfo | grep -i "model name" | head -1

echo
echo "## distribution info (lsb_release -a) ##"
lsb_release -a

echo
echo "## Kernel info (uname -r) ##"
uname -r

pyalot avatar Jun 15 '14 15:06 pyalot

## Nvidia driver info (cat /proc/driver/nvidia/version) ##
NVRM version: NVIDIA UNIX x86_64 Kernel Module  337.25  Tue May 27 11:05:28 PDT 2014
GCC version:  gcc version 4.9.0 20140521 (prerelease) (GCC) 

## GPU info (glxinfo | grep -i "opengl renderer string") ##
OpenGL renderer string: GeForce GTX 660/PCIe/SSE2

## CPU info (cat /proc/cpuinfo | grep -i "model name" | head -1) ##
model name  : Intel(R) Core(TM) i7 CPU         920  @ 2.67GHz

## distribution info (lsb_release -a) ##
LSB Version:    1.4
Distributor ID: Arch
Description:    Arch Linux
Release:    rolling
Codename:   n/a

## Kernel info (uname -r) ##
3.14.6-1-ARCH

manuelbua avatar Jun 15 '14 15:06 manuelbua

Cool, thanks, I will get around to file the browser bugs tomorrow.

pyalot avatar Jun 15 '14 15:06 pyalot

Great, thanks much Florian, it has been so kind of you!

manuelbua avatar Jun 15 '14 15:06 manuelbua

Oh, one thing I forgot, also the chrome and firefox version.

pyalot avatar Jun 15 '14 16:06 pyalot

Oh you are right: Chromium is Version 35.0.1916.153 (274914) Firefox is v30.0

manuelbua avatar Jun 15 '14 16:06 manuelbua

Here's the Google Chrome ticket: https://code.google.com/p/chromium/issues/detail?id=384847

pyalot avatar Jun 15 '14 16:06 pyalot

Starred and endorsed, also do you think it is a good idea to link to the nvidia bug in the browser tickets so that they can correlate the two?

manuelbua avatar Jun 15 '14 16:06 manuelbua

Mah.. given the amount of information i put in that report they may be able to get in touch with nVidia in some way hopefully, so let's see how everything work out.

manuelbua avatar Jun 15 '14 18:06 manuelbua

Firefox ticket: https://bugzilla.mozilla.org/show_bug.cgi?id=1025676

pyalot avatar Jun 15 '14 20:06 pyalot

Great, thanks, endorsed that as well!

manuelbua avatar Jun 15 '14 21:06 manuelbua

Does this conformance test crash for you? http://www.khronos.org/registry/webgl/sdk/tests/conformance/glsl/bugs/multiplication-assignment.html

pyalot avatar Jul 01 '14 07:07 pyalot

Yes it does, but also noticed an odd context thing where hitting the "reload webgl" button won't really reload anything so i need to restart it completely: i updated the google bug report as well, but luckily nvidia already fixed the bug in their working branch.

manuelbua avatar Jul 01 '14 19:07 manuelbua