Pillow Why is tobytes() using large amounts of memory?

hello， i am using pillow's image.tobytes func.

and i use soma memory profiler to find why there is too much allocation.

one said that it is image.load(), the other said it is encoder.encode()。

which one is correct?

Jan 30 '24 06:01 zdyj3170101136

If you would like us to help you figure out why your code is using a lot of memory, could you post a short self-contained example script?

one said that it is image.load(), the other said it is encoder.encode()

Are you saying that you're using two different memory profilers? Or you are running two different scripts? Or that you ran the same code with the same memory profiler twice and got different results?

Jan 30 '24 06:01 radarhere

如果您希望我们帮助您找出代码使用大量内存的原因，您可以发布一个简短的独立示例脚本吗？

一说是image.load()，一说是encoder.encode()

您是说您正在使用两个不同的内存分析器吗？或者您正在运行两个不同的脚本？或者您使用相同的内存分析器运行相同的代码两次并得到不同的结果？

mycode.

#from ddtrace.profiling import Profiler

#prof = Profiler(
#        url="http://127.0.0.1:8081",
  #enable_code_provenance=True, # provide library version
#)
#prof.start() # Should be as early as possible, eg before other imports, to ensure everything is profiled
import os
import PIL
from PIL import Image
import time
dir = "/home/centos/pillow-bench/Images_0/0"
onlyfiles = []

filedir = dir
files = [f for f in os.listdir(filedir) if os.path.isfile(os.path.join(filedir, f))]
filepaths = map( lambda x: os.path.join(filedir, x), files )
onlyfiles += list(filepaths)

images = []
s = 0
for i in range(len(onlyfiles)):
    fname = onlyfiles[i]
    image = PIL.Image.open(fname)
    # Do something
    images.append(image)
    image.tobytes("xbm", "rgb")
    if i / 10000 == 1:
        print('iteration ', i)
        break
#time.sleep(30
#input("Press Enter to continue...")

the memray says the memory cost is most in self.load()

截屏2024-01-30 下午2 49 07

the datadog says the memory cost is not most in self.load(). 截屏2024-01-30 下午2 50 14

Jan 30 '24 06:01 zdyj3170101136

You're running your code over a number of images. Could you pick just one image that you feel is using too much memory, and upload it here?

Jan 30 '24 07:01 radarhere

您正在对许多图像运行代码。您能否选择一张您认为占用内存过多的图像并将其上传到此处？

image is from kaggle https://www.kaggle.com/c/avito-duplicate-ads-detection/data?select=Images_0.zip

Jan 30 '24 07:01 zdyj3170101136

here is one sample. 1000

Jan 30 '24 07:01 zdyj3170101136

the datadog says the memory cost is not most in self.load().

If I'm following, then you have looked at this diagram and think that tobytes() is doing something expensive before calling load().

Here are the first few lines of im.tobytes()- It is not doing anything expensive before calling load(). https://github.com/python-pillow/Pillow/blob/15dc4312d2f6a4ec02f5bcb44b13cb5a779f3f88/src/PIL/Image.py#L735-L742

Jan 30 '24 10:01 radarhere

@zdyj3170101136 did that answer your question?

Feb 01 '24 09:02 radarhere

@zdyj3170101136这回答了你的问题吗？

the datadog only sample python memory allocation.

and it's output is not accurate.

the memray's output is correct.

Feb 02 '24 02:02 zdyj3170101136

Oh, I thought that was your question.

Pillow stores each pixel in an RGB image internally as 4 bytes. Your image is 140px wide by 91px high, multiplied by 4 bytes is almost 50kb.

If I run

import os, psutil
from PIL import Image
pid = os.getpid()
process = psutil.Process(pid)

im = Image.open('input.jpg')

start = process.memory_info().rss
im.tobytes("xbm", "rgb")
end = process.memory_info().rss
print(f'{(end - start) / 1024}kb')

with the image you uploaded on my machine, I usually get 224.0kb.

Is your question why that number is not lower?

Feb 02 '24 07:02 radarhere

If I try the code from my previous comment with the same image in PNG format, I usually get 96kb or 112kb. The internal image is stored in the same way in Pillow, so the difference would be in the decoding, either in Pillow itself or when Pillow hands data to libjpeg.

Also, you might try calling close() on your images once you are finished with them - https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.close

This operation will destroy the image core and release its memory.

Feb 08 '24 09:02 radarhere

@wiredfool while you're here and thinking about memory, did you have any thoughts on this issue?

Apr 02 '24 11:04 radarhere

from the original code --

for i in range(len(onlyfiles)):
    fname = onlyfiles[i]
    image = PIL.Image.open(fname)
    # Do something
    images.append(image)
    image.tobytes("xbm", "rgb")
    if i / 10000 == 1:
        print('iteration ', i)
        break

Here, all of the long term memory is going to be from the memory pools in Storage.c, which will be retained because the image is added to the images list and not destroyed. So it's going to be increasing by O(pixelstorage) each iteration. The ephemeral memory used by tobytes will come and go on each loop.

If we're looking at an individual image, I'd expect that we'll see roughly 2n+constant memory, one from the pixelstorage, and one from the tobytes, and then some other assorted memory. Valgrind would be where to really find the allocations though, but you'd need a debugging build to do it.

Apr 02 '24 12:04 wiredfool

Thanks.

Apr 02 '24 12:04 radarhere

Closing, as this is an issue asking a question, but it's not clear what the question is, and the user appears to have lost interest.

Apr 02 '24 12:04 radarhere

Pillow Pillow copied to clipboard

Why is tobytes() using large amounts of memory?

Pillow
Pillow copied to clipboard