chainercv
chainercv copied to clipboard
Faster image transforms
Currently, all images are assumed to be converted to numpy.ndarray
right after loaded from the disk.
However, this leads to unnecessary copy of images.
For example, in the case when a crop of an image is needed, it is not necessary to load the entire image into a numpy array.
By not copying the data to a numpy array right after image loading, this kind of optimization becomes possible.
To verify that improvements can happen, I wrote a simple example.
This example supplies a dataset to a MultiprocessIterator
and measures the performance.
The dataset crops an image by fixed size.
NoPILDataset
uses the current method to load an image.
PILDataset
calls crop
method from PIL.
I tested the performance with train split of ImageNet. The spec of my machine is as follows.
- Intel(R) Xeon(R) CPU E5-1650 v3 @ 3.50GHz (6 cores)
- 128GB
- Ubuntu 14.04
- Chainer 3.0.0a1
After iterating for 500 iterations, the results are as follows. The proposed change was 1.56 times faster.
# The current
$ python benchmark.py 0
count=500 recent_speed=4.16689417015 overall_speed=3.71505159313
# The proposed
$ python benchmark.py 1
count=500 recent_speed=7.18092074851 overall_speed=5.09444490369
Note that this improvement is very important when training with a large batch (e.g. ImageNet). I was having a performance issue with training a model on ImageNet.
# filename: benchmark.py
# Usage:
# python benchmark 0 # the current method
# python benchmark 1 # the proposed method
import numpy as np
import os
from PIL import Image
import time
import chainer
class NoPILDataset(chainer.dataset.DatasetMixin):
def __init__(self, paths):
self.paths = paths
def __len__(self):
return len(self.paths)
def get_example(self, i):
path = self.paths[i]
f = Image.open(path)
img = f.convert('RGB')
img = np.asarray(img).transpose(2, 0, 1)
img = img[:, :224, :224]
return img
class PILDataset(chainer.dataset.DatasetMixin):
def __init__(self, paths):
self.paths = paths
def __len__(self):
return len(self.paths)
def get_example(self, i):
path = self.paths[i]
f = Image.open(path)
img = f.convert('RGB').crop((0, 0, 224, 224))
img = np.asarray(img).transpose(2, 0, 1)
return img
if __name__ == '__main__':
import sys
# Path to the training dataset of ImageNet.
# (It can be any root directory of a image dataset.)
dirname = '/data/imagenet/train'
paths = []
for cur_dir, _, names in os.walk(dirname):
for name in names:
paths.append(os.path.join(cur_dir, name))
if int(sys.argv[1]) == 1:
print('use PIL directly')
dataset = PILDataset(paths)
else:
print('do not use PIL directly')
dataset = NoPILDataset(paths)
it = chainer.iterators.MultiprocessIterator(dataset, 192, shared_mem=3 * 224 * 224 * 4, n_processes=12, shuffle=False)
start = time.time()
times = []
count = 0
while True:
if count == 500:
break
recent_start = time.time()
try:
it.next()
except StopIteration:
break
end = time.time()
times.append(end)
count += 1
print(
'count={} recent_speed={} overall_speed={}'.format(
count, 1./(end - recent_start), count / (end - start)))
I made a simpler benchmark script to measure time to load images with crop.
from PIL import Image
import numpy as np
from chainercv.utils import write_image
import time
import cv2
def crop_pil(path):
img = Image.open(path).convert('RGB')
img = img.crop((0, 0, 224, 224))
img = np.asarray(img).transpose(2, 0, 1)
return img
def crop_numpy(path):
img = Image.open(path).convert('RGB')
img = np.asarray(img).transpose(2, 0, 1)
img = img[:, :224, :224]
return img
def crop_cv2(path):
img = cv2.imread(path, cv2.IMREAD_COLOR).transpose(2, 0, 1)
img = img[::-1, :224, :224]
return img
if __name__ == '__main__':
img = np.random.uniform(0, 255, size=(3, 4000, 4000))
path = 'a.jpg'
img = write_image(img, path)
times = []
for i in range(30):
start = time.time()
crop_cv2(path)
times.append(time.time() - start)
print('crop_cv2 mean={}'.format(np.mean(times)))
times = []
for i in range(30):
start = time.time()
crop_pil(path)
times.append(time.time() - start)
print('crop_pil mean={}'.format(np.mean(times)))
times = []
for i in range(30):
start = time.time()
crop_numpy(path)
times.append(time.time() - start)
print('crop_numpy mean={}'.format(np.mean(times)))
Results:
crop_cv2 mean=0.272049093246
crop_pil mean=0.301412550608
crop_numpy mean=0.324815416336
If using cv2.imread
is fastest, we can simply use cv2.imread
in read_image
. This doesn't require any changes of APIs. If you want to use PIL
, we have to change APIs.
Yes. That is my conclusion too.
I am guessing that most of the time is spent decoding jpg
image, and it seems that cv2
has a better decoder.
I am guessing that most of the time is spent decoding jpg image, and it seems that cv2 has a better decoder.
I guess this is depends on the configuration of OpenCV. I will try your benchmark in my environment.
In my environment, cv2 was fastest, too.
crop_cv2 mean=0.22242753505706786
crop_pil mean=0.34299739996592205
crop_numpy mean=0.38132399717966714