bgfx NanoVG broken when filling large number of vertices

In my app I need to draw complex polygons with lots of nvgLineTo(), and it may require more than 200 thousands vertices (gl->nverts) in the vertex buffer in nvgRenderFlush(). However, the rendered results looks as expected when using the original NanoVG but the same code will render a broken result when switching the backend to bgfx. This only happens when using nvgFill() with lots of nvgLineTo(). Are there limits of transient vertex buffer in bgfx?

PS. Nothing is wrong when using nvgStroke() with the same complex polygon. Only nvgFill() breaks.

May 25 '17 16:05 olliwang

Are there limits of transient vertex buffer in bgfx?

Yes. It's configurable with BGFX_CONFIG_TRANSIENT_VERTEX_BUFFER_SIZE https://github.com/bkaradzic/bgfx/blob/master/src/config.h#L292

May 25 '17 16:05 bkaradzic

Thanks for reply. The default value of BGFX_CONFIG_TRANSIENT_VERTEX_BUFFER_SIZE is big enough to not cause vertex buffer overflow. I tried to inspect and play with the glnvg__fill() function, and it seems the broken rendering is generated by the second and third submit() calls. Any idea?

May 25 '17 16:05 olliwang

Can you reproduce this issue in small example?

May 25 '17 17:05 bkaradzic

Just generated a sample code at https://gist.github.com/olliwang/5454465c18bb601d2619f40aeddf96a0

And here's the difference tested on iPhone 6s+. PS. Both look the same when using nvgStroke() instead of nvgFill(). bgfx_nanovg_fill

May 25 '17 17:05 olliwang

I thought the bug is caused by m_textureDescriptor.pixelFormat in renderer_mtl.mm. The pixel format is initialized as MTLPixelFormatDepth32Float_Stencil8 in iOS correctly but changed to other at some point. It seems nvgFill() only works as expected when the format remains MTLPixelFormatDepth32Float_Stencil8 in iOS. (nanovg relies on stencil)

PS. The pixel format is changed in the TextureMtl::create() method. Is it normal to change the original pixel format after creating a framebuffer?

Jun 03 '17 14:06 olliwang

There is no way to change surface type after it's created, you must destroy it and created different one. See if this create/destroy is happening, and what causes it.

Jun 03 '17 20:06 bkaradzic

@bkaradzic It seems the pixel format changes after calling the nvgluCreateFramebuffer() function. I just wrote a simple test code and it won't change the pixel format if I commented out nvgluCreateFramebuffer().

Jun 04 '17 06:06 olliwang

Ah get it...

Try replacing this code: https://github.com/bkaradzic/bgfx/blob/master/examples/common/nanovg/nanovg_bgfx.cpp#L1194

With:

bgfx::TextureHandle textures[] =
{
	bgfx::createTexture2D(_width, _height, false, 1, bgfx::TextureFormat::RGBA8, BGFX_TEXTURE_RT),
	bgfx::createTexture2D(_width, _height, false, 1, bgfx::TextureFormat::D24S8, BGFX_TEXTURE_RT | BGFX_TEXTURE_RT_WRITE_ONLY);
};
bgfx::FrameBufferHandle fbh = bgfx::createFrameBuffer(BX_COUNTOF(textures), textures, true);

Jun 04 '17 06:06 bkaradzic

@bkaradzic Thanks, but it doesn't work.

I added the following code to the RendererContextMtl::submit() method.

if (m_textureDescriptor.pixelFormat != MTLPixelFormatDepth32Float_Stencil8)
  printf("Wrong Pixel Format\n");
else
  printf("Correct Pixel Format\n");

And Wrong Pixel Format still prints as long as I call nvgluCreateFramebuffer().

Jun 04 '17 06:06 olliwang

Change MTLPixelFormatDepth32Float here https://github.com/bkaradzic/bgfx/blob/master/src/renderer_mtl.mm#L334 to MTLPixelFormatDepth32Float_Stencil8

Jun 04 '17 06:06 bkaradzic

@bkaradzic Unfortunately that doesn't work either. I found that the pixel format is changed because of this line: https://github.com/bkaradzic/bgfx/blob/master/src/renderer_mtl.mm#L2497

Jun 04 '17 07:06 olliwang

Found the solution. This line should also change to MTLPixelFormatDepth32Float_Stencil8 https://github.com/bkaradzic/bgfx/blob/master/src/renderer_mtl.mm#L584 Now the pixel format is consistent in RendererContextMtl::submit() for iOS.

Too bad this still doesn't fix the broken nvgFill() issue. :(

Jun 04 '17 08:06 olliwang

It's been quite a while, but is there any update on this?

May 09 '20 23:05 mcourteaux

I went full bananacakes on this, and I solved my rendering issues by adding not allowing this function to be inlined:

diff --git a/examples/common/nanovg/nanovg_bgfx.cpp b/examples/common/nanovg/nanovg_bgfx.cpp
index 3446e4058..e87280326 100644
--- a/examples/common/nanovg/nanovg_bgfx.cpp
+++ b/examples/common/nanovg/nanovg_bgfx.cpp
@@ -569,6 +569,7 @@ namespace
                bgfx::setViewRect(gl->viewId, 0, 0, width * devicePixelRatio, height * devicePixelRatio);
        }
 
+       __attribute__ ((noinline))
        static void fan(uint32_t _start, uint32_t _count)
        {
                uint32_t numTris = _count-2;
@@ -580,6 +581,7 @@ namespace
                        data[ii*3+0] = uint16_t(_start);
                        data[ii*3+1] = uint16_t(_start + ii + 1);
                        data[ii*3+2] = uint16_t(_start + ii + 2);
+                       //asm volatile ("":::);
                }
 
                bgfx::setIndexBuffer(&tib);

As you can see, the inline asm block was something that I was trying.

Why this, and what happens? This was either a compiler bug, or otherwise undefined behavior trigger by bgfx code. I am not sure which one. Basically, after doing a lot of searching, and stepping through the debugger, it was clear to me that this loop generating the triangle fan indices is not running for the required number of iterations. Yet the code seemed to be correct, so I went paranoid, and tried to confuse the compiler enough to disable some more aggressive optimizations. Disallowing inlining worked, and I believe this asm block also did the trick.

It's been a while that I was looking at this, but I found this diff in the bgfx repo (when I was at commit 8b6a6bdf0eae130a62f0af003282bf042ed97e7b), so I thought I should post this.

Nov 24 '23 23:11 mcourteaux

@mcourteaux It doesn't make sense that noinline would fix this code...

More likely is that bgfx::allocTransientIndexBuffer doesn't return amount needed, and example code doesn't handle that particular case at all.

	bgfx::allocTransientIndexBuffer(&tib, numTris*3);
	BX_ASSERT(tib.size == numTris*3*(tib.isIndex16 ? 2 : 4), ""); // <--- add this to your code and remove noinline...

Nov 25 '23 04:11 bkaradzic

I know it doesn't make sense. I spent a really long time analyzing this problem in renderdoc. I configured bgfx with a giant index buffer to make sure that was not a problem. I added print statements, I added inline asserts, and drilled it down to this function not filling in all the indices in the index buffer.

The code looked correct to me, yet when stepping through it with a debugger, it behaved incorrectly.

Nov 25 '23 11:11 mcourteaux

TL;DR: The indices overflow the uint16_t type when narrow-casting, when the _start argument is too high. Read below the final seperator for the on-topic conclusions/questions.

Okay, let's get rigorous, and follow your advice like you said in your tweet. :smile:

Here is the repro using the nanovg demo (demo nr. 20), on the latest git master branch of bx, bgfx, bimg:

diff --git a/examples/20-nanovg/nanovg.cpp b/examples/20-nanovg/nanovg.cpp
index f9db9d9a6..4b66ee7eb 100644
--- a/examples/20-nanovg/nanovg.cpp
+++ b/examples/20-nanovg/nanovg.cpp
@@ -1316,7 +1316,9 @@ void renderDemo(struct NVGcontext* vg, float mx, float my, float width, float h
e
 {
        float x,y,popx,popy;
 
-       drawEyes(vg, width-800, height-240, 150, 100, mx, my, t);
+       for (int i = 0; i < 200; ++i) {
+               drawEyes(vg, (i % 20) * 120, (i / 20) * 150, 80, 80, mx, my, t);
+       }
        drawParagraph(vg, width - 550, 35, 150, 100, mx, my);
        drawGraph(vg, 0, height/2, width, height/2, t);
 
@@ -1408,6 +1410,7 @@ public:
                init.resolution.width  = m_width;
                init.resolution.height = m_height;
                init.resolution.reset  = m_reset;
+               init.limits.transientIbSize = 40000000; // Optional!
                bgfx::init(init);
 
                // Enable debug text.
diff --git a/examples/common/nanovg/nanovg_bgfx.cpp b/examples/common/nanovg/nanovg_bgfx.cpp
index adcaededc..3647d7d2f 100644
--- a/examples/common/nanovg/nanovg_bgfx.cpp
+++ b/examples/common/nanovg/nanovg_bgfx.cpp
@@ -558,6 +558,7 @@ namespace
                uint32_t numTris = _count-2;
                bgfx::TransientIndexBuffer tib;
                bgfx::allocTransientIndexBuffer(&tib, numTris*3);
+               BX_ASSERT(tib.size == numTris*3*(tib.isIndex16 ? 2 : 4), ""); // <--- add this to your code and remove noinline...
                uint16_t* data = (uint16_t*)tib.data;
                for (uint32_t ii = 0; ii < numTris; ++ii)
                {

I did a fresh make projgen and make linux, I start the demo: ./examplesRelease 20:

Same goes for ./examplesDebug 20. Note that the assert you suggested doesn't trigger in either build. I also tried to NOT increase the size of buffer upon init(), i.e., the line marked with // Optional! can be commented out, and the demo still runs without asserting, with the exact same result. The buffer size is not the problem, which is something I remember clearly from my earlier investigations in the debugger. On the other hand, shrinking the transientIbSize to 40000 for example is enough to cause asserts to go trigger.

As we somehow expected, and as it goes with Undefined Behaviour, neither my no-inline nor my asm volatile block, do solve the issue.

I'm on gcc 11.4.0. Correct me if I'm missing anything, but I believe this is a simple repro that demonstrates the issue, and shows that it is unrelated to the size of the transient buffer.

As to my vague memory of what was happening from when I was in the debugger, side-by-side with renderdoc. Bgfx uses the index buffer, and reuses the same index buffer. Because of the magical bug, different times the triangle-fan indices would be written to the buffer, sometimes it would stop too early, and therefore the index buffer would contain partly the correct indices, and partly the old indices from the last draw call. However, the number of triangles is calculated correctly, and the committed draw call uses the correct number of indices to be rendered as the triangle fan. So, to rephrase, the reason things start glitching is because the triangle fan is made of triangles specified by a combination of correct and old (garbage) indices. The garbage indices were still just sitting there in the buffer, and the loop stopped too early and did not overwrite them all.

Too make things juicier, I managed to capture this in RenderDoc again, and this heisenbug behaves very odd in RenderDoc as well. The first time you pass that draw command, it glitches. Step back with RenderDoc, and the glitch disappears, and cannot be reconstructed. Here is a video:

https://github.com/bkaradzic/bgfx/assets/845012/3c4f31ae-1c10-4cad-ac7a-bfe4fdd363cd

I did however, find that the indices overflow. I added this assert:

diff --git a/examples/common/nanovg/nanovg_bgfx.cpp b/examples/common/nanovg/nanovg_bgfx.cpp
index adcaededc..29469b0f5 100644
--- a/examples/common/nanovg/nanovg_bgfx.cpp
+++ b/examples/common/nanovg/nanovg_bgfx.cpp
@@ -558,12 +558,14 @@ namespace
                uint32_t numTris = _count-2;
                bgfx::TransientIndexBuffer tib;
                bgfx::allocTransientIndexBuffer(&tib, numTris*3);
+               BX_ASSERT(tib.size == numTris*3*(tib.isIndex16 ? 2 : 4), ""); // <--- add this to your code and remove noinline...
                uint16_t* data = (uint16_t*)tib.data;
                for (uint32_t ii = 0; ii < numTris; ++ii)
                {
                        data[ii*3+0] = uint16_t(_start);
                        data[ii*3+1] = uint16_t(_start + ii + 1);
                        data[ii*3+2] = uint16_t(_start + ii + 2);
+                       BX_ASSERT(uint16_t(_start + ii + 2) == (_start + ii + 2), "overflow");
                }
 
                bgfx::setIndexBuffer(&tib);

This overflow is I think what might trigger the compiler to exploit this UB, and might explain what I was seeing in the past regarding the loop not doing all the iterations? The compiler might have stopped the loop early whenever it things it will overflow (although I can't find any statement that such a narrowing cast would be defined as UB. This suggests that only signed-integer overflow is UB, but all types involved seem to be unsigned types)?

Drawing only 50 eyes, is enough for this overflow assert to not trigger, and not have any glitches.

So here is my concluding question: am I correct saying that this nanovg-demo-side bug? In other words: I believe that the glnvg_fill() function should commit the draw command to bgfx, before the paths[i].fillOffset would overflow the uint16_t index type.

I'm not familiar enough with the nanovg bgfx code to determine where such a flush/commit should be implemented.

Nov 25 '23 13:11 mcourteaux

I have a patch! Will post PR soon. Feel free to wait spending brain-cycles on this until PR is there. :smile: :brain:

Nov 25 '23 14:11 mcourteaux

bgfx bgfx copied to clipboard

NanoVG broken when filling large number of vertices

bgfx
bgfx copied to clipboard