bve-reborn icon indicating copy to clipboard operation
bve-reborn copied to clipboard

Freezes on Metal

Open cwfitzgerald opened this issue 5 years ago • 8 comments

Reports of the current builds freezing up Macs. The entire machine freezes, likely due to a graphics issue.

cwfitzgerald avatar May 20 '20 03:05 cwfitzgerald

More info:

validateComputeFunctionArguments:771: failed assertion `Compute Function(main0): argument _80[0] from buffer(0) with offset(0) and length(136) has space for 136 bytes, but argument has a length(144).'

This might be the metal shader that is created:

    #pragma clang diagnostic ignored "-Wmissing-prototypes"
    #pragma clang diagnostic ignored "-Wmissing-braces"
    
    #include <metal_stdlib>
    #include <simd/simd.h>
    
    using namespace metal;
    
    template<typename T, size_t Num>
    struct spvUnsafeArray
    {
        T elements[Num ? Num : 1];
        
        thread T& operator [] (size_t pos) thread
        {
            return elements[pos];
        }
        constexpr const thread T& operator [] (size_t pos) const thread
        {
            return elements[pos];
        }
        
        device T& operator [] (size_t pos) device
        {
            return elements[pos];
        }
        constexpr const device T& operator [] (size_t pos) const device
        {
            return elements[pos];
        }
        
        constexpr const constant T& operator [] (size_t pos) const constant
        {
            return elements[pos];
        }
        
        threadgroup T& operator [] (size_t pos) threadgroup
        {
            return elements[pos];
        }
        constexpr const threadgroup T& operator [] (size_t pos) const threadgroup
        {
            return elements[pos];
        }
    };
    
    struct _74
    {
        packed_float3 _m0;
        float _m1;
    };
    
    struct _77
    {
        _74 _m0[4];
    };
    
    struct _78
    {
        float4x4 _m0;
        _77 _m1;
        uint2 _m2;
    };
    
    struct _234
    {
        packed_float3 _m0;
        float _m1;
    };
    
    struct _236
    {
        _234 _m0[4];
    };
    
    struct _238
    {
        _236 _m0[1];
    };
    
    constant uint3 gl_WorkGroupSize [[maybe_unused]] = uint3(8u, 8u, 1u);
    
    kernel void main0(constant _78& _80 [[buffer(0)]], device _238& _240 [[buffer(1)]], uint3 gl_GlobalInvocationID [[thread_position_in_grid]])
    {
        float2 _86 = float2(_80._m2);
        float2 _87 = float2(float3(gl_GlobalInvocationID).xy) / _86;
        float2 _99 = float2(float3(gl_GlobalInvocationID + uint3(1u)).xy) / _86;
        float _108 = mix(-1.0, 1.0, _87.x);
        float _111 = mix(-1.0, 1.0, _87.y);
        spvUnsafeArray<float4, 4> _102;
        _102[0] = float4(_108, _111, 1.0, 1.0);
        float _118 = mix(-1.0, 1.0, _99.x);
        _102[1] = float4(_118, _111, 1.0, 1.0);
        float _129 = mix(-1.0, 1.0, _99.y);
        _102[2] = float4(_108, _129, 1.0, 1.0);
        _102[3] = float4(_118, _129, 1.0, 1.0);
        spvUnsafeArray<float3, 4> _171;
        for (int _431 = 0; _431 < 4; )
        {
            float4 _159 = _80._m0 * _102[_431];
            _171[_431] = (_159.xyz / float3(_159.w)).xyz;
            _431++;
            continue;
        }
        uint _373 = (gl_GlobalInvocationID.y * _80._m2.x) + gl_GlobalInvocationID.x;
        _240._m0[_373]._m0[0]._m0 = -normalize(cross(_171[0], _171[2]));
        _240._m0[_373]._m0[0]._m1 = 0.0;
        _240._m0[_373]._m0[1]._m0 = -normalize(cross(_171[3], _171[1]));
        _240._m0[_373]._m0[1]._m1 = 0.0;
        _240._m0[_373]._m0[2]._m0 = -normalize(cross(_171[1], _171[0]));
        _240._m0[_373]._m0[2]._m1 = 0.0;
        _240._m0[_373]._m0[3]._m0 = -normalize(cross(_171[2], _171[3]));
        _240._m0[_373]._m0[3]._m1 = 0.0;
    }

StarArawn avatar May 29 '20 23:05 StarArawn

I think it's saying that the shader expects a uniform struct of size 144, but your buffer binding only has space for 136. This means that accessing ._m2 fields is going outside of bounds, which is a good reason for the HW to freeze.

kvark avatar May 30 '20 00:05 kvark

I think this is possibly related to https://github.com/gfx-rs/wgpu/issues/185

cwfitzgerald avatar Jun 12 '20 21:06 cwfitzgerald

Can't you easily test if that's the case (by providing more space to the buffers), or you have no platform to reproduce this?

kvark avatar Jun 12 '20 21:06 kvark

I don't actually have access to a mac with xcode on it to really test thoroughly, though my friend does, so when I put some effort into solving it, I can bug them. I am probably also running afowl of alignment requirements that 0.5 didn't test for, and I've yet to update this codebase fully to master.

cwfitzgerald avatar Jun 12 '20 21:06 cwfitzgerald

I found out what the issue was on metal, and thought I should document it here.

In froxels.cs.glsl:

layout(set = 0, binding = 0) uniform Uniforms {
    mat4 inv_proj;
    Frustum frustum;
    uvec2 froxel_count;
};

gets turned into:

struct _78
{
    float4x4 _m0;
    _77 _m1;
    uint2 _m2;
};

Metal expects that the alignment matches perfectly even for uniforms and not just buffers because uniforms and buffers are treated the same. If you use a uint4 or rather a uvec4 instead the alignment would be fine.

StarArawn avatar Jul 21 '20 19:07 StarArawn

Metal validation catches that, we saw similar errors reported. Make sure to run with it from time to time, until wgpu's own validation is getting there!

kvark avatar Jul 21 '20 19:07 kvark

Ooo thank you for documenting this! So the solution is to use uvec4 for the froxel count? Is there anything I could do on my end to keep the shader the same? I guess the fundamental issue is that I'm uploading the padding bytes after the struct...

cwfitzgerald avatar Jul 21 '20 21:07 cwfitzgerald