DirectXShaderCompiler icon indicating copy to clipboard operation
DirectXShaderCompiler copied to clipboard

Amplification shader repro.: Invalid payload or 'TGSM pointers must originate from an unambiguous TGSM global variable'

Open DanHCM opened this issue 2 years ago • 3 comments

I've modified the D3D12DynamicLOD project in the D3D12MeshShaders samples solution to just do one single DispatchMesh(1,1,1) command that should launch just one of following modified amplification shader thread groups, which should then populate the payload data with 4 instance structures and launch 4 down-stream mesh shader thread groups. Each of the expected 4 mesh shader thread groups should read the same shared payload from their originating up-stream A.S. to extract the instance structure within the payload that's assigned to each M.S. thread group to work on. But the payload seems to be full of nonsense data.

The relevant additions/changes are -

---- d3d12dynamiclod.cpp ----
//    for (uint32_t i = 0; i < dispatchCount; ++i)
//    {
//        uint32_t offset = dispatchCount * i;
//        uint32_t count = min(m_instanceCount - offset, c_maxGroupDispatchCount);
//
//        m_commandList->SetGraphicsRoot32BitConstant(1, offset, 0);
//        m_commandList->SetGraphicsRoot32BitConstant(1, count, 1);
//
//        m_commandList->DispatchMesh(count, 1, 1);
//    }
    m_commandList->DispatchMesh(1, 1, 1);

---- common.hlsl ----

struct TriInstance
{
    float4 m_ndcPos;
};

struct SharedPayload
{
    uint m_numInstances;
    uint3 m_pad;
    TriInstance m_instances[64];
};

struct TriInstance
{
    float4 m_ndcPos;
};

---- MeshletAS.hlsl ----

groupshared SharedPayload g_sharedPayload;

[RootSignature(ROOT_SIG)]
[NumThreads(8, 8, 1)]
void main( in uint2 groupThreadID : SV_GroupThreadID )
{
    g_sharedPayload.m_numInstances = 0;
    
    GroupMemoryBarrierWithGroupSync();
    
    if ( all(groupThreadID < (2u).xx) )
    {
        TriInstance newInst;
        newInst.m_ndcPos = float4( -0.5.xx + (float2)groupThreadID.xy, 0.5, 1.0 );
        uint myNewInstanceIdx;
        InterlockedAdd( g_sharedPayload.m_numInstances, 1, myNewInstanceIdx );
        g_sharedPayload.m_instances[myNewInstanceIdx] = newInst;
    }
    
    GroupMemoryBarrierWithGroupSync();
    DispatchMesh( g_sharedPayload.m_numInstances, 1, 1, g_sharedPayload );
}

---- MeshletMS.hlsl ----

[RootSignature(ROOT_SIG)]
[NumThreads(64, 1, 1)]
[OutputTopology("triangle")]
void main(
    in uint groupID : SV_GroupID,
    in payload SharedPayload payloadIn,

    out vertices PosOnlyVtx verts[32],
    out indices uint3 tris[32])
{
    SetMeshOutputCounts(3/*totalVertCount*/, 1/*totalPrimCount*/);

    float4 instanceNDCRootPos = payloadIn.m_instances[groupID].m_ndcPos;

    verts[0].m_pos = instanceNDCRootPos;
    verts[1].m_pos = instanceNDCRootPos + float4(0.0, 0.2, 0.0, 0.0);
    verts[2].m_pos = instanceNDCRootPos + float4(0.1, 0.0, 0.0, 0.0);

    tris[0] = uint3(0,1,2);
}

---- MeshletPS.hlsl ----

[RootSignature(ROOT_SIG)]
float4 main(in PosOnlyVtx pin) : SV_TARGET
{
    return float4( 0.5, 0.1, 1.0, 1.0 );
}

Ignoring the pointlessness of the example, if I also add in a RWStructuredBuffer<float> DebugVals into which I have the first/[0,0] thread of the A.S. group write all 4 final payload instance elements (after the final GroupMemoryBarrierWithGroupSync()), i.e. -

if ( all(groupThreadID == (0u).xx) )
{
    DebugVals[0] = g_sharedPayload.m_instances[0].m_ndcPos.x;
    DebugVals[1] = g_sharedPayload.m_instances[0].m_ndcPos.y;
    ... etc

I can see in PIX that they’re all nonsense values. However writing plain old immediate values -

    ...
    DebugVals[4] = 123.0;
    DebugVals[5] = 456.0;
    DebugVals[6] = 789.0;
    ....

Then they're all present and correct.

So is there a problem with the filling in of the elements of the groupshared playload? I.e. -

g_sharedPayload.m_instances[myNewInstanceIdx] = newInst;

Well, changing just that line to something like this -

g_sharedPayload.m_instances[myNewInstanceIdx].m_ndcPos = 1234.0.xxxx;

now gives a slightly cryptic compiler error -

> dxc.exe /nologo /Emain /Fo bin\x64\Debug\MeshletAS.cso /Od /Zi /Tas_6_5 -Qembed_debug /Fd bin\x64\Debug\MeshletAS.pdb MeshletAS.hlsl

error: validation errors
MeshletAS.hlsl:35:56: error: TGSM pointers must originate from an unambiguous TGSM global variable.
note: at '%22 = getelementptr inbounds [4 x float], [4 x float]
addrspace(3)* %21, i32 0, i32 0' in block '#1' of function 'main'.
Validation failed.

which happens with both -

C:\Program Files (x86)\Windows Kits\10\bin\10.0.20348.0\x64\dxc.exe.
Version: dxcompiler.dll: 1.6 - 1.5.0.2748 (2cad836b2); dxil.dll: 
1.6(101.5.2005.60)

and with the 2021-12-08 github dxc –

Version: dxcompiler.dll: 1.6 - 1.6.2112.12 (770ac0cc1); dxil.dll: 
1.6(101.6.2112.2)

So, along with this issue, I do wonder whether I’ve somehow stumbled across a few possibly related compiler issues in this AS/MS area. This code is now at the point where it’s getting kind of difficult to try to work around these issues by restructuring the AS threads' method of populating the payload array elements in any more simpler ways.

Cheers

Dan

DanHCM avatar May 05 '22 08:05 DanHCM

I have a PR #4452 which fixes related issue #4421. That may fix this issue as well, but I haven't had a chance to construct/try this repro.

Artifacts for the PR build should show up here once the AppVeyor build is done.

If you have a chance to try with dxcompiler.dll from these artifacts and find that this issue no longer repros, please let us know!

Thanks! -Tex

tex3d avatar May 12 '22 02:05 tex3d

Thanks @tex3d I've just grabbed the latest built artifact referenced by appveyor in the PR you mention and can confirm that -

dxc.exe /nologo /Emain /Fo bin\x64\Debug\MeshletAS.cso /Od /Zi /Tas_6_5 -Qembed_debug /Fd bin\x64\Debug\MeshletAS.pdb MeshletAS.hlsl

appears to succeed (no warning/errors spat out any more). However, I can provoke a new error, very similar to the original repro. with just a very simple change to the above repro code.

Replacing, in the MeshletAS.hlsl, 'main' (with all the changes shown above), the line -

g_sharedPayload.m_instances[myNewInstanceIdx] = newInst;

with -

g_sharedPayload.m_instances[myNewInstanceIdx].m_ndcPos = 1234.0.xxxx;

dxc now produces -

error: validation errors
MeshletAS.hlsl:35:56: error: TGSM pointers must originate from an unambiguous TGSM global variable.
note: at '%18 = getelementptr inbounds [4 x float], [4 x float] addrspace(3)* %17, i32 0, i32 0' in block '#1' of function 'main'.
Validation failed.

DanHCM avatar May 16 '22 16:05 DanHCM

I don't know how this kind of stuff is usually handled but since it's been a while without any kind of acknowledgement of this as an outstanding issue, I worry there's a slight possibility it'll be forgotten. Any chance of at least an acknowledgement of this as an ongoing issue that'll be fixed eventually?

DanHCM avatar Jul 12 '22 14:07 DanHCM

This is currently on our list of things we'll try and get to.

damyanp avatar Apr 22 '24 16:04 damyanp

As reported, the 2207 release fails with the shader described in this comment: https://godbolt.org/z/4ffG4Ks1G

However the latest release compiles correctly: https://godbolt.org/z/W3c8s7dzY

Closing as resolved

pow2clk avatar Apr 22 '24 20:04 pow2clk