server icon indicating copy to clipboard operation
server copied to clipboard

OpenGL: Buffer cleanup

Open Julusian opened this issue 4 years ago • 6 comments

Description

Once a buffer for a frame is allocated it gets used, then stored in a pool for later use. This mostly works fine, but isn't good when media is of unpredictable dimensions, as you will collect buffers for rarely used dimensions until you run out of memory or restart caspar.

As a short term solution, the GL INFO and GL GC commands have been restored to allow users to trigger a cleanup of all buffers. By doing this they can reclaim some memory, at the cost of having to reallocate buffers of all sizes, and isn't always appropriate to run.

Solution suggestion

We need to be better in freeing buffers that are no longer of any use.

Julusian avatar Nov 06 '19 22:11 Julusian

How big of an issue is this in practice?

ronag avatar Nov 06 '19 23:11 ronag

I don't expect it to be a problem very often, more often being just an annoyance. But it all depends on use case and environment. Thinking about it, let's leave it purely as a command for now, but leave this issue to revisit later.

Julusian avatar Nov 07 '19 00:11 Julusian

This is our memory issue raised under #1214

On the Ubuntu 18.04 system in question that has been running for a while GL INFO does show that 3700 pooled device buffers are allocated for 77 different resolutions,

GL GC does free up multiple gigs of system memory and 70% of memory on the gpu.

saltomodules avatar Dec 04 '19 15:12 saltomodules

@ronag @Julusian Is this something for version v2.3.0 LTS?

dotarmin avatar Apr 07 '20 18:04 dotarmin

@dotarmin I think this needs more planning than we have time for right now. And this can be worked around with the AMCP commands which are available

Julusian avatar Apr 14 '20 10:04 Julusian

Hi, i think we are suffering from this as well, but GL GC does not help - CasparCG keeps allocating new textures/consuming memory until it crashes.

Basically what we are doing is sending a set of PLAY commands with LOOP and I see a lot of textures with weird resolutions (I assume they are for U/V planes but I might be wrong) growing and not returning to the pool. Not sure how much help this gives, but basically there are two files:

Stream #0:0(eng): Video: h264 (Main) (avc1 / 0x31637661), yuv420p(tv, bt709), 3072x1056, 27150 kb/s, 60 fps, 60 tbr, 60k tbn, 120 tbc (default)

and

Stream #0:1: Video: vp6a, yuva420p, 1920x660, 8192 kb/s, 60 fps, 60 tbr, 1k tbn

so when I've run this a couple of times, here is the output of GL INFO:

<?xml version="1.0" encoding="utf-8"?>
<gl>
   <details>
      <pooled_device_buffers>
         <device_buffer_pool>
            <stride>1</stride>
            <mipmapping>false</mipmapping>
            <width>3072</width>
            <height>1056</height>
            <size>3244032</size>
            <count>3</count>
         </device_buffer_pool>
         <device_buffer_pool>
            <stride>1</stride>
            <mipmapping>false</mipmapping>
            <width>1536</width>
            <height>528</height>
            <size>811008</size>
            <count>6</count>
         </device_buffer_pool>
         <device_buffer_pool>
            <stride>1</stride>
            <mipmapping>false</mipmapping>
            <width>1920</width>
            <height>660</height>
            <size>1267200</size>
            <count>62</count>
         </device_buffer_pool>
         <device_buffer_pool>
            <stride>1</stride>
            <mipmapping>false</mipmapping>
            <width>960</width>
            <height>330</height>
            <size>316800</size>
            <count>62</count>
         </device_buffer_pool>
         <device_buffer_pool>
            <stride>4</stride>
            <mipmapping>false</mipmapping>
            <width>3840</width>
            <height>2160</height>
            <size>33177600</size>
            <count>2</count>
         </device_buffer_pool>
      </pooled_device_buffers>
      <pooled_host_buffers>
         <host_buffer_pool>
            <usage>read_only</usage>
            <size>33177600</size>
            <count>2</count>
         </host_buffer_pool>
         <host_buffer_pool>
            <usage>write_only</usage>
            <size>3244032</size>
            <count>5</count>
         </host_buffer_pool>
         <host_buffer_pool>
            <usage>write_only</usage>
            <size>811008</size>
            <count>9</count>
         </host_buffer_pool>
         <host_buffer_pool>
            <usage>write_only</usage>
            <size>1267200</size>
            <count>64</count>
         </host_buffer_pool>
         <host_buffer_pool>
            <usage>write_only</usage>
            <size>316800</size>
            <count>64</count>
         </host_buffer_pool>
      </pooled_host_buffers>
   </details>
   <summary>
      <pooled_device_buffers>
         <total_count>135</total_count>
         <total_size>179161344</total_size>
      </pooled_device_buffers>
      <pooled_host_buffers>
         <total_read_count>2</total_read_count>
         <total_write_count>142</total_write_count>
         <total_read_size>66355200</total_read_size>
         <total_write_size>124895232</total_write_size>
      </pooled_host_buffers>
      <all_host_buffers>
         <total_read_count>6</total_read_count>
         <total_write_count>159</total_write_count>
         <total_read_size>199065600</total_read_size>
         <total_write_size>145829376</total_write_size>
      </all_host_buffers>
   </summary>
</gl>

I don't know what was meant by unpredictable dimensions, but what seems really weird is that even both clips have 60fps (and ffprobe -show_packets shows 16.667 ms as duration time for both) it puzzles me that one file has 3/4 textures, while other has 60+.

I also added a member to the texture queue to see how many textures have been allocated, and that keeps growing (and is larger than the value shown by GC INFO thus implying that a lot of textures are in flight?). What is the rule on how many textures should actually be allocated? Is that driven by how fast the consumer handles them, or basically in the case with files they grow as fast as packets come in from ffmpeg?

After even more debugging I added ffmpeg consumers to my both channels and interestingly both files have fps > 60 (smth like 60.01, 60.26, randomly changing). How does CasparCG drive the timers? Is it possible that since the pts values are not nice there is a building error and since deckling may run at exactly 60fps the textures pile up in some intermediate queues? Adjusting the source files to 50fps did not help and the ffmpeg output is still 50.xx. I see how the flv file which has a timebase of millisecond resolution might have screwed up everything, but now I am only running a simple mp4 with 50fps. Should I open a new bug about this?

rubu avatar Sep 30 '20 13:09 rubu