OpenImageIO [MEMORY] Reduce file descriptors memory footprint in the ImageCache

We would like to reduce the memory usage of file descriptors in the ImageCache as we observed a large amount of duplicated data. This is mainly due to image descriptors (ImageSpec ) being stored for each mip map (LevelInfo ) of each sub-image (SubImageInfo).

Clearing up some memory should be possible in that area ( LevelInfo / SubImageInfo ) but needs a bit a refactor, that I'm happy to start during DevDays with help from @lgritz.

More to come in the next comments about the approach to follow.

Tasks

[x] Change API: https://github.com/AcademySoftwareFoundation/OpenImageIO/pull/4442
[x] Change backend: https://github.com/AcademySoftwareFoundation/OpenImageIO/pull/4664
[ ] Change ImageBuf: https://github.com/AcademySoftwareFoundation/OpenImageIO/pull/4482

Sep 20 '24 22:09 bfraboni

Yes, I agree that we want to store the ImageSpec per subimage, and the individual MIP levels really only need to know their resolutions and tile sizes, I think those are the only things that meaningfully differ from one MIP level to the next. They don't all need their own ImageSpec's at all.

Sep 20 '24 22:09 lgritz

Additionally, even at the subimage level, we currently store both a "spec" and a "nativespec".

The original concept was that nativespec reflects what was in the file, whereas spec describes what's in memory in the cache. The main way they can differ is (a) in the cache, only a few pixel data types are allowed (uint8, uint16, half, float) and all channels must be the same data type, whereas in the file they can be other types and can differ among the channels; (b) in the cache, the image always appears "tiled", sometimes the real tiling of the file, sometimes looking like one big tile for the whole image, sometimes looking tiled even though it's not ("autotile"). But the real metadata will not change.

(In my defense, when this was designed, we saw a lot less metadata in files, certainly not big things like ICC profiles, thumbnails, or other big things. So having a second ImageSpec didn't seem like a big deal, but now it is.)

OK, so what I'm thinking is that we could store only ONE ImageSpec per subimage, and then only the couple of fields that explain how it differs between the file and the buffer.

I'm not yet of a strong opinion whether the one spec should be the file, with separate fields for the data format and tile size of the buffer? Or if the spec should be the buffer, with additional fields saying what the data formats and tiling was like in the file? (Probably the first option, but I'm not 100% sure.)

The question is what to do about the get_imagespec() and imagespec() methods, both of which take a native flag to say which it's selecting.

IC::get_imagespec makes a copy, so it could copy the one imagespec and then modify those few extra fields if they're asking for the other kind.

IC::imagespec() is a lot trickier because it returns a pointer. This is where I got stuck before. One thing we could do is decide that henceforth, the nativespec parameter is simply ignored, and the spec pointer that is returned is always the file one (? or the memory one?) and add a separate new API call to retrieve the in-memory data type or tile size.

It may also be instructive to see how in ImageSpec itself, there is a copy_dimensions() method, which is a very lightweight way of copying only a few important fields from one spec to another. Maybe there is something analogous that we need in ImageCache.

Sep 20 '24 22:09 lgritz

@jmertic PR #4442 is not enough to close this issue, we also need to rework the ImageCache backend and remove the actual source of high memory usage in LevelInfo. That will be the focus of a second PR that'll start soon.

Sep 26 '24 16:09 bfraboni

No worries - we are just trying to link things up for tracking. Feel free to tag other PRs

Sep 26 '24 17:09 jmertic

Re-opening. The PR should not have been tagged as closing this issue. It was just one necessary step along the way.

Sep 28 '24 22:09 lgritz

Hey there, picking up on where this has been left back in October.

Problem 1

The internal ImageCache in imagecache_pvt.h stores up to two ImageSpec structures per mip map level descriptor (LevelInfo) which is very redundant, so we need to get rid of them to save memory. This is especially visible when using ptex files which translate to one subimage per ptex face, resulting in milions of ImageSpec used.

For the quick history on the nativespec VS spec distinction see @lgritz comment here

Frontend work (done)

We changed the imagespec API to always reflect what is in the file at mip level 0, i.e. the "nativespec", and we introduced a new function get_cache_dimensions to override the dimensions of an ImageSpec with the ones of a specific mip level. The PR has been merged in OIIO 3.0.

Backend work (done)

With the new API we got rid of the concept of "native" spec in the ImageCache internals. The changes were implemented as a part of this PR. The main logic change is that an ImageCacheFile is responsible of:

a pool of ImageSpec to describe subimages at mip level 0 (and tp store the subimage metadata once only),
a pool of ImageDims to describe miplevel dimensions that differs from their associated subimage.

Both pool allow reusing the descriptors across subimages and mip levels to limit the memory usage.

Problem 2 (TODO)

The same native spec approach is also used in ImageBuf, duplicating the image descriptors. The WIP PR aims to deduplicate these descriptors.

Jan 17 '25 00:01 bfraboni