Pillow
Pillow copied to clipboard
Why is tobytes() using large amounts of memory?
hello, i am using pillow's image.tobytes func.
and i use soma memory profiler to find why there is too much allocation.
one said that it is image.load(), the other said it is encoder.encode()。
which one is correct?
If you would like us to help you figure out why your code is using a lot of memory, could you post a short self-contained example script?
one said that it is image.load(), the other said it is encoder.encode()
Are you saying that you're using two different memory profilers? Or you are running two different scripts? Or that you ran the same code with the same memory profiler twice and got different results?
如果您希望我们帮助您找出代码使用大量内存的原因,您可以发布一个简短的独立示例脚本吗?
一说是image.load(),一说是encoder.encode()
您是说您正在使用两个不同的内存分析器吗?或者您正在运行两个不同的脚本?或者您使用相同的内存分析器运行相同的代码两次并得到不同的结果?
mycode.
#from ddtrace.profiling import Profiler
#prof = Profiler(
# url="http://127.0.0.1:8081",
#enable_code_provenance=True, # provide library version
#)
#prof.start() # Should be as early as possible, eg before other imports, to ensure everything is profiled
import os
import PIL
from PIL import Image
import time
dir = "/home/centos/pillow-bench/Images_0/0"
onlyfiles = []
filedir = dir
files = [f for f in os.listdir(filedir) if os.path.isfile(os.path.join(filedir, f))]
filepaths = map( lambda x: os.path.join(filedir, x), files )
onlyfiles += list(filepaths)
images = []
s = 0
for i in range(len(onlyfiles)):
fname = onlyfiles[i]
image = PIL.Image.open(fname)
# Do something
images.append(image)
image.tobytes("xbm", "rgb")
if i / 10000 == 1:
print('iteration ', i)
break
#time.sleep(30
#input("Press Enter to continue...")
the memray says the memory cost is most in self.load()
the datadog says the memory cost is not most in self.load().
You're running your code over a number of images. Could you pick just one image that you feel is using too much memory, and upload it here?
您正在对许多图像运行代码。您能否选择一张您认为占用内存过多的图像并将其上传到此处?
image is from kaggle https://www.kaggle.com/c/avito-duplicate-ads-detection/data?select=Images_0.zip
here is one sample.
the datadog says the memory cost is not most in self.load().
If I'm following, then you have looked at this diagram and think that tobytes()
is doing something expensive before calling load()
.
Here are the first few lines of im.tobytes()
- It is not doing anything expensive before calling load()
.
https://github.com/python-pillow/Pillow/blob/15dc4312d2f6a4ec02f5bcb44b13cb5a779f3f88/src/PIL/Image.py#L735-L742
@zdyj3170101136 did that answer your question?
@zdyj3170101136这回答了你的问题吗?
the datadog only sample python memory allocation.
and it's output is not accurate.
the memray's output is correct.
Oh, I thought that was your question.
Pillow stores each pixel in an RGB image internally as 4 bytes. Your image is 140px wide by 91px high, multiplied by 4 bytes is almost 50kb.
If I run
import os, psutil
from PIL import Image
pid = os.getpid()
process = psutil.Process(pid)
im = Image.open('input.jpg')
start = process.memory_info().rss
im.tobytes("xbm", "rgb")
end = process.memory_info().rss
print(f'{(end - start) / 1024}kb')
with the image you uploaded on my machine, I usually get 224.0kb.
Is your question why that number is not lower?
If I try the code from my previous comment with the same image in PNG format, I usually get 96kb or 112kb. The internal image is stored in the same way in Pillow, so the difference would be in the decoding, either in Pillow itself or when Pillow hands data to libjpeg.
Also, you might try calling close()
on your images once you are finished with them - https://pillow.readthedocs.io/en/stable/reference/Image.html#PIL.Image.Image.close
This operation will destroy the image core and release its memory.
@wiredfool while you're here and thinking about memory, did you have any thoughts on this issue?
from the original code --
for i in range(len(onlyfiles)):
fname = onlyfiles[i]
image = PIL.Image.open(fname)
# Do something
images.append(image)
image.tobytes("xbm", "rgb")
if i / 10000 == 1:
print('iteration ', i)
break
Here, all of the long term memory is going to be from the memory pools in Storage.c
, which will be retained because the image is added to the images list and not destroyed. So it's going to be increasing by O(pixelstorage) each iteration. The ephemeral memory used by tobytes
will come and go on each loop.
If we're looking at an individual image, I'd expect that we'll see roughly 2n+constant
memory, one from the pixelstorage, and one from the tobytes, and then some other assorted memory. Valgrind would be where to really find the allocations though, but you'd need a debugging build to do it.
Thanks.
Closing, as this is an issue asking a question, but it's not clear what the question is, and the user appears to have lost interest.