Image size=(672, 512) cannot be processed by most model
Hi, I'm using your colab tutorial to compress some random size images. And I found that when image size=(672, 512), they cannot be processed. Could you fix this bug?
There are different ways to deal with this:
- Pad the image until it is a multiple of 64.
- A memory-efficient "smart padding" method proposed by Huawei (IIRC).
See below for example code.
There are different ways to deal with this. One way is to pad the image until it is a multiple of 64.
But I want to compress the image. Wouldn't padding affect the compression ratio?
I believe pre-padding before a transform is a necessary evil that even some non-learned codecs use, though I'm guessing they've become more efficient at it throughout the years. Perhaps adaptive entropy coding mitigates the cost of the additional data. Also, DFT/DCT/DWT of a zero-padded signal may not be so bad due to some properties of the transforms.
Worst-case test
Testing by padding an extra 64x64:
import torch
import torch.nn.functional as F
from compressai.zoo import bmshj2018_hyperprior
from PIL import Image
from torchvision import transforms
device = "cuda"
net = bmshj2018_hyperprior(quality=2, pretrained=True).eval().to(device)
img = Image.open("/data/datasets/kodak/test/kodim01.png").convert("RGB")
x = transforms.ToTensor()(img).unsqueeze(0).to(device)
def inference(net, x, pad):
unpad = tuple(-p for p in pad)
with torch.no_grad():
out_enc = net.compress(F.pad(x, pad))
[y_strings, z_strings] = out_enc["strings"]
size = sum(len(s) for s in y_strings) + sum(len(s) for s in z_strings)
out_dec = net.decompress(**out_enc)
x_hat = F.pad(out_dec["x_hat"], unpad).clamp_(0, 1)
psnr = -10 * ((x_hat - x)**2).mean().log10()
return size, psnr
size, psnr = inference(net, x, pad=(0, 0, 0, 0))
print(f"No pad: {size} bytes {psnr:.2f} dB")
size, psnr = inference(net, x, pad=(0, 64, 0, 64))
print(f"Extra pad: {size} bytes {psnr:.2f} dB")
bmshj2018-hyperprior:
No pad: 16500 bytes 26.88 dB
Extra pad: 16920 bytes 26.89 dB (2.5% increase in bytes)
mbt2018:
No pad: 14680 bytes 27.14 dB
Extra pad: 14824 bytes 27.16 dB (1% increase in bytes)
Yes, it looks like there is a small difference felt, particularly for entropy models that are "weaker" at spatial redundancy.
Smart padding
Another method I've seen proposed by Huawei (IIRC) is to pad to a multiple of 2 (i.e. even dimensions) just before each 2x downsampling operation. But you'll need to intertwine these operations into the model layers. Perhaps define a SmartPadding layer:
import torch.nn as nn
import torch.nn.functional as F
from compressai.ops import compute_padding
class SmartPadding(nn.Module):
def __init__(self, min_div=2, method="pad", link=None, **padding_kwargs):
super().__init__()
self.min_div = min_div
self.padding_kwargs = padding_kwargs
self.method = method
self._unpad = None
self._linked_modules = []
if link is not None:
self._linked_modules.append(link)
def forward(self, x):
*_, h, w = x.shape
if self.method == "pad":
padding, self._unpad = compute_padding(h, w, min_div=self.min_div)
elif self.method == "unpad":
padding = self._linked_modules[0]._unpad
print(self.method, h, w, padding)
return F.pad(x, padding, **self.padding_kwargs)
And then define a method for injecting these layers into an existing model:
def inject_model(self, **kwargs):
# self.g_a = nn.Sequential(
# SmartPadding(**kwargs),
# conv(3, N),
# GDN(N),
# SmartPadding(**kwargs),
# conv(N, N),
# GDN(N),
# SmartPadding(**kwargs),
# conv(N, N),
# GDN(N),
# SmartPadding(**kwargs),
# conv(N, M),
# )
self.g_a.insert(0, SmartPadding(**kwargs))
self.g_a.insert(3, SmartPadding(**kwargs))
self.g_a.insert(6, SmartPadding(**kwargs))
self.g_a.insert(9, SmartPadding(**kwargs))
# self.g_s = nn.Sequential(
# deconv(M, N),
# SmartPadding(method="unpad", link=self.g_a[9], **kwargs),
# GDN(N, inverse=True),
# deconv(N, N),
# SmartPadding(method="unpad", link=self.g_a[6], **kwargs),
# GDN(N, inverse=True),
# deconv(N, N),
# SmartPadding(method="unpad", link=self.g_a[3], **kwargs),
# GDN(N, inverse=True),
# deconv(N, 3),
# SmartPadding(method="unpad", link=self.g_a[0], **kwargs),
# )
self.g_s.insert(1, SmartPadding(method="unpad", link=self.g_a[9], **kwargs))
self.g_s.insert(4, SmartPadding(method="unpad", link=self.g_a[6], **kwargs))
self.g_s.insert(7, SmartPadding(method="unpad", link=self.g_a[3], **kwargs))
self.g_s.insert(10, SmartPadding(method="unpad", link=self.g_a[0], **kwargs))
# self.h_a = nn.Sequential(
# conv(M, N, stride=1, kernel_size=3),
# nn.ReLU(inplace=True),
# SmartPadding(**kwargs),
# conv(N, N),
# nn.ReLU(inplace=True),
# SmartPadding(**kwargs),
# conv(N, N),
# )
self.h_a.insert(2, SmartPadding(**kwargs))
self.h_a.insert(5, SmartPadding(**kwargs))
# self.h_s = nn.Sequential(
# deconv(N, N),
# SmartPadding(method="unpad", link=self.h_a[5], **kwargs),
# nn.ReLU(inplace=True),
# deconv(N, N),
# SmartPadding(method="unpad", link=self.h_a[2], **kwargs),
# nn.ReLU(inplace=True),
# conv(N, M, stride=1, kernel_size=3),
# nn.ReLU(inplace=True),
# )
self.h_s.insert(1, SmartPadding(method="unpad", link=self.h_a[5], **kwargs))
self.h_s.insert(4, SmartPadding(method="unpad", link=self.h_a[2], **kwargs))
...And finally, let's inject and compare the various methods:
import torch
import torch.nn.functional as F
from compressai.zoo import bmshj2018_hyperprior
from PIL import Image
from torchvision import transforms
device = "cuda"
net = bmshj2018_hyperprior(quality=2, pretrained=True).eval().to(device)
img = Image.open("/data/datasets/kodak/test/kodim01.png").convert("RGB")
x = transforms.ToTensor()(img).unsqueeze(0).to(device)
def inference(net, x, pad):
unpad = tuple(-p for p in pad)
with torch.no_grad():
out_enc = net.compress(F.pad(x, pad))
[y_strings, z_strings] = out_enc["strings"]
size = sum(len(s) for s in y_strings) + sum(len(s) for s in z_strings)
out_dec = net.decompress(**out_enc)
x_hat = F.pad(out_dec["x_hat"], unpad).clamp_(0, 1)
psnr = -10 * ((x_hat - x)**2).mean().log10()
return size, psnr
size, psnr = inference(net, x, pad=(0, 0, 0, 0))
print(f"No pad: {size} bytes {psnr:.2f} dB")
size, psnr = inference(net, x, pad=(0, 64, 0, 64))
print(f"Extra pad: {size} bytes {psnr:.2f} dB")
inject_model(net, mode="constant", value=0)
size, psnr = inference(net, x, pad=(0, 1, 0, 1))
print(f"Smart pad: {size} bytes {psnr:.2f} dB")
Output:
No pad: 16500 bytes 26.88 dB
Extra pad: 16920 bytes 26.89 dB
Smart pad: 17100 bytes 26.78 dB
Smart padding in action:
method height width padding
pad 513 769 (0, 1, 0, 1)
pad 257 385 (0, 1, 0, 1)
pad 129 193 (0, 1, 0, 1)
pad 65 97 (0, 1, 0, 1)
pad 33 49 (0, 1, 0, 1)
pad 17 25 (0, 1, 0, 1)
unpad 18 26 (0, -1, 0, -1)
unpad 34 50 (0, -1, 0, -1)
unpad 66 98 (0, -1, 0, -1)
unpad 130 194 (0, -1, 0, -1)
unpad 258 386 (0, -1, 0, -1)
unpad 514 770 (0, -1, 0, -1)
...Hmm... it's not as good as I had hoped, but this was without any special training. We're padding tensors with zeroes, and hoping it doesn't alter their behavior too much, after all. Perhaps better models that are padding-adaptive could be trained.
Hi, I'm using your colab tutorial to compress some random size images. And I found that when image size=(672, 512), they cannot be processed. Could you fix this bug?
Hello, I would like to ask you a question about the installation of compressai “Successfully installed compressai-1.2.4.dev0” . I successfully installed cmpressai, but I got the following error when importing:
import compressai Traceback (most recent call last): File "
", line 1, in File "F:\compressai\compressai_init_.py", line 30, in from compressai import ( File "F:\compressai\compressai\latent_codecs_init_.py", line 38, in from .rasterscan import RasterScanLatentCodec File "F:\compressai\compressai\latent_codecs\rasterscan.py", line 38, in from compressai.ans import BufferedRansEncoder, RansDecoder ImportError: DLL load failed while importing ans: 找不到指定的程序。