MetalNanoVG icon indicating copy to clipboard operation
MetalNanoVG copied to clipboard

could path tessellation be done using the GPU?

Open wtholliday opened this issue 5 years ago • 2 comments

In my profiles, bezier tessellation is consuming considerable CPU time, while the GPU is mostly idle.

I've tried optimizing bezier tessellation on the front-end with little success.

I was wondering about the feasibility of supporting path tessellation (nvg__flattenPaths) on the GPU. This would involve a change to the front-end, but could be easily made compatible with other back-ends.

Bezier tessellation on the GPU could work as follows. The input is a series of Bezier segments to be flattened. A compute kernel is responsible for flattening. Each segment is processed by a thread. If the segment is flat (either by a flag on the segment, or a test if the flag isn't set), then the segment is outputted. If not flat, two segments are outputted according to bezier subdivision. Atomic operations would be used to append to the output buffer (note that this wouldn't preserve segment order, and that might be an issue. If it is, then a prefix-sum could be used to generate output indices).

Here is some pseudocode:

kernel void subdivideBezier(device BezierSegment* input,
                                               device BezierSegment* output,
                                               device atomic_uint& outCount,
                                               uint gid [[ therad_index_in_grid ]]) {

  BezierSegment out[2];
  int n=1;

  BezierSegment s = input[gid];

  if(s.flat) {
     out[0] = s;
  } else if(flattnessTest(s)) {
     s.flat = true;
     out[0] = s;
  } else {
     out[0] = leftChild(s);
     out[1] = rightChild(s);
     n = 2;
   }

   int i = atomic_add(&outCount, n);
   output[i] = out[0];
   if(n == 2) output[I+1] = out[1];
}

This kernel would be run in multiple passes over the curves according to the maximum subdivision level (hardcoded at 10 in the front-end). Buffers would be sized to accept the maximum number of segments (2^10).

The back end would have to handle generating segments for strokes, line-cap, line-end, etc.

The other GPU path rendering approaches I'm aware of involve sophisticated fragment shaders, so this may be easier to implement.

It may also be worth looking at https://github.com/linebender/piet-metal. It currently doesn't seem to handle curves on the GPU (Beziers are subdivided on the CPU), but takes a distance-field approach to path rendering.

See also http://hhoppe.com/ravg.pdf and https://www.microsoft.com/en-us/research/wp-content/uploads/2005/01/p1000-loop.pdf.

wtholliday avatar May 16 '19 13:05 wtholliday

MetalNanoVG follows the style of the corresponded NanoVG project. To do what you suggested here, we need to touch the original NanoVG code and MetalNanoVG won't share the same nanovg.c anymore. How about implementing such improvement in NanoVG/OpenGL first?

olliwang avatar May 16 '19 17:05 olliwang

Unfortunately, OpenGL doesn't have compute kernels, so doing it the way I described isn't possible with OpenGL. If they switched to a Vulkan backend, then it's possible.

I think the changes to nanovg.c might be minimal. I'd be more concerned about the amount of time required to implement this.

wtholliday avatar May 16 '19 18:05 wtholliday