mpv icon indicating copy to clipboard operation
mpv copied to clipboard

GL_OUT_OF_MEMORY [vo/gpu-next] Failed creating LUT texture

Open 24fpsDaVinci opened this issue 2 years ago • 4 comments

  • mpv version latest git master build with gpu-next
  • Linux Distribution and Version fedora 36
  • Source of the mpv binary git master with gpu-next
  • If known which version of mpv introduced the problem
  • Window Manager and version wayland
  • GPU driver and version TigerLake-LP GT2 [Iris Xe Graphics], Kernel driver in use: i915

Reproduction steps

use 512x512x512 3D lut --icc-3dlut-size=xx

Expected behavior

happens on 512x512x512 cube lut, works fine at 256. was using 512x512x512 for last couple of months before the upgrade to gitmaster and gpu-next. lots of unused ram.

Actual behavior

black screen mpv hangs

Log file

"[vo/gpu-next] gl_tex_create: texture: OpenGL error: GL_OUT_OF_MEMORY [vo/gpu-next] Failed creating LUT texture"

24fpsDaVinci avatar Jul 11 '22 07:07 24fpsDaVinci

The major difference between gpu and gpu-next is that the former generates 16-bit integer 3DLUTs while the latter generates 32-bit float 3DLUTs. This doubles the VRAM requirement from 1GB to 2GB. This is not an intentional change, and mostly a result of internal abstractions making that more convenient. But in testing such extreme sizes I also noticed that the gpu-next generation is almost an order of magnitude slower as a consequence - and also not cached. I'll try and fix both those issues.

It's also worth pointing out that gpu-next can generate two 3DLUTs, one for the file and one for the display. So it might be worth keeping that in mind - your VRAM usage may unexpectedly double a second time, if you're playing something with an embedded ICC profile and have a display ICC profile simultaneously.

haasn avatar Jul 11 '22 10:07 haasn

Okay, some more poignant notes:

  1. I submitted this MR changing the 3DLUTs from rgba32f to rgba16, thus halving the VRAM usage as expected.
  2. This does not, however, substantially affect 3DLUT generation times.
  3. The only reason the vo=gpu generation times were so fast was because vo=gpu was incorrectly generating large 3DLUTs, resulting in an effective precision closer to 49x49x49. So your 512x512x512 3DLUT never did anything to begin with, at least not compared to a smaller and more reasonable LUT size. I have submitted this PR to fix this issue, which (as a natural consequence) makes the 512x512x512 generation take 30 seconds with just vo=gpu, too.
  4. libplacebo actually has internal logic for picking the best 3DLUT size according to the type of ICC profile (e.g. picking smaller sizes for easily characterized ones, and larger ones for LUT-based display profiles). However, this heuristic is not active in mpv because of the way the --icc-3dlut-size setting is designed - it always overrides the auto-detected 3DLUT size. I'm considering changing the way this is handled, anyway.

So here's my conclusion:

  1. It was failing for you because gpu-next uses double the precision, unnecessarily - an anti-feature which I agree should be fixed. (Wasting VRAM for no reason is never justified)
  2. Your LUT size is way too large to the point of being completely useless, and even with these bugs fixed, you will not want to be waiting 30 seconds for the ~1GB 3DLUT to be generated on playback. (Though admittedly this is only an issue with vo=gpu-next currently as a result of ICC 3DLUT caching not being implemented yet)

haasn avatar Jul 11 '22 11:07 haasn

you will not want to be waiting 30 seconds for the ~1GB 3DLUT to be generated on playback

Does gpu without -next uses rgba16? Anyway, is this 30 seconds on gpu-next without caching and on gpu? Little CMS really cannot be optimised further? Also, this thing can be generated with CUDA, it will be faster. Doom9 mentioned there is CUDA possibility for generation of this. Of course this would have to be implemented in Little CMS, IMHO.

Oh, and also what is the point of generating 512^3 [cube] if you present on 8 bit display, e.g.? Is this really making it better? Should not you calculate optimal 3DLUT for bitness of display?

ZaquL avatar Jul 13 '22 04:07 ZaquL

Does gpu without -next uses rgba16?

Yes

Anyway, is this 30 seconds on gpu-next without caching and on gpu?

Yes, no

Little CMS really cannot be optimised further?

Don't ask me

Also, this thing can be generated with CUDA, it will be faster.

Patches welcome

Oh, and also what is the point of generating 512^3 [cube]

None

Should not you calculate optimal 3DLUT for bitness of display?

No, you should calculate optimal 3DLUT for internal resolution of the ICC profile (typically 49x49x49 or 65x65x65)

haasn avatar Jul 13 '22 11:07 haasn

The actual bugs here seem resolved so closing.

Dudemanguy avatar Jan 09 '23 01:01 Dudemanguy