Pillow icon indicating copy to clipboard operation
Pillow copied to clipboard

Why is tobytes() using large amounts of memory?

Open zdyj3170101136 opened this issue 1 year ago • 10 comments

hello, i am using pillow's image.tobytes func.

and i use soma memory profiler to find why there is too much allocation.

one said that it is image.load(), the other said it is encoder.encode()。

which one is correct?

zdyj3170101136 avatar Jan 30 '24 06:01 zdyj3170101136

If you would like us to help you figure out why your code is using a lot of memory, could you post a short self-contained example script?

one said that it is image.load(), the other said it is encoder.encode()

Are you saying that you're using two different memory profilers? Or you are running two different scripts? Or that you ran the same code with the same memory profiler twice and got different results?

radarhere avatar Jan 30 '24 06:01 radarhere

如果您希望我们帮助您找出代码使用大量内存的原因,您可以发布一个简短的独立示例脚本吗?

一说是image.load(),一说是encoder.encode()

您是说您正在使用两个不同的内存分析器吗?或者您正在运行两个不同的脚本?或者您使用相同的内存分析器运行相同的代码两次并得到不同的结果?

mycode.

#from ddtrace.profiling import Profiler

#prof = Profiler(
#        url="http://127.0.0.1:8081",
  #enable_code_provenance=True, # provide library version
#)
#prof.start() # Should be as early as possible, eg before other imports, to ensure everything is profiled
import os
import PIL
from PIL import Image
import time
dir = "/home/centos/pillow-bench/Images_0/0"
onlyfiles = []

filedir = dir
files = [f for f in os.listdir(filedir) if os.path.isfile(os.path.join(filedir, f))]
filepaths = map( lambda x: os.path.join(filedir, x), files )
onlyfiles += list(filepaths)

images = []
s = 0
for i in range(len(onlyfiles)):
    fname = onlyfiles[i]
    image = PIL.Image.open(fname)
    # Do something
    images.append(image)
    image.tobytes("xbm", "rgb")
    if i / 10000 == 1:
        print('iteration ', i)
        break
#time.sleep(30
#input("Press Enter to continue...")

the memray says the memory cost is most in self.load()

截屏2024-01-30 下午2 49 07

the datadog says the memory cost is not most in self.load(). 截屏2024-01-30 下午2 50 14

zdyj3170101136 avatar Jan 30 '24 06:01 zdyj3170101136

You're running your code over a number of images. Could you pick just one image that you feel is using too much memory, and upload it here?

radarhere avatar Jan 30 '24 07:01 radarhere

您正在对许多图像运行代码。您能否选择一张您认为占用内存过多的图像并将其上传到此处?

image is from kaggle https://www.kaggle.com/c/avito-duplicate-ads-detection/data?select=Images_0.zip

zdyj3170101136 avatar Jan 30 '24 07:01 zdyj3170101136

here is one sample. 1000

zdyj3170101136 avatar Jan 30 '24 07:01 zdyj3170101136

the datadog says the memory cost is not most in self.load().

截屏2024-01-30 下午2 50 14

If I'm following, then you have looked at this diagram and think that tobytes() is doing something expensive before calling load().

Here are the first few lines of im.tobytes()- It is not doing anything expensive before calling load(). https://github.com/python-pillow/Pillow/blob/15dc4312d2f6a4ec02f5bcb44b13cb5a779f3f88/src/PIL/Image.py#L735-L742

radarhere avatar Jan 30 '24 10:01 radarhere

@zdyj3170101136 did that answer your question?

radarhere avatar Feb 01 '24 09:02 radarhere

@zdyj3170101136这回答了你的问题吗?

the datadog only sample python memory allocation.

and it's output is not accurate.

the memray's output is correct.

zdyj3170101136 avatar Feb 02 '24 02:02 zdyj3170101136

Oh, I thought that was your question.

Pillow stores each pixel in an RGB image internally as 4 bytes. Your image is 140px wide by 91px high, multiplied by 4 bytes is almost 50kb.

If I run

import os, psutil
from PIL import Image
pid = os.getpid()
process = psutil.Process(pid)

im = Image.open('input.jpg')

start = process.memory_info().rss
im.tobytes("xbm", "rgb")
end = process.memory_info().rss
print(f'{(end - start) / 1024}kb')

with the image you uploaded on my machine, I usually get 224.0kb.

Is your question why that number is not lower?

radarhere avatar Feb 02 '24 07:02 radarhere

If I try the code from my previous comment with the same image in PNG format, I usually get 96kb or 112kb. The internal image is stored in the same way in Pillow, so the difference would be in the decoding, either in Pillow itself or when Pillow hands data to libjpeg.

Also, you might try calling close() on your images once you are finished with them - https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.close

This operation will destroy the image core and release its memory.

radarhere avatar Feb 08 '24 09:02 radarhere

@wiredfool while you're here and thinking about memory, did you have any thoughts on this issue?

radarhere avatar Apr 02 '24 11:04 radarhere

from the original code --

for i in range(len(onlyfiles)):
    fname = onlyfiles[i]
    image = PIL.Image.open(fname)
    # Do something
    images.append(image)
    image.tobytes("xbm", "rgb")
    if i / 10000 == 1:
        print('iteration ', i)
        break

Here, all of the long term memory is going to be from the memory pools in Storage.c, which will be retained because the image is added to the images list and not destroyed. So it's going to be increasing by O(pixelstorage) each iteration. The ephemeral memory used by tobytes will come and go on each loop.

If we're looking at an individual image, I'd expect that we'll see roughly 2n+constant memory, one from the pixelstorage, and one from the tobytes, and then some other assorted memory. Valgrind would be where to really find the allocations though, but you'd need a debugging build to do it.

wiredfool avatar Apr 02 '24 12:04 wiredfool

Thanks.

radarhere avatar Apr 02 '24 12:04 radarhere

Closing, as this is an issue asking a question, but it's not clear what the question is, and the user appears to have lost interest.

radarhere avatar Apr 02 '24 12:04 radarhere