lossyless
lossyless copied to clipboard
Efficient way to integrate lossyless into a PyTorch Dataset subclass
Hey @YannDubs,
I recently discovered your paper and find the idea very interesting. Therefore, I would like to integrate lossyless
into a project I am currently working on. However, there are two requirements/presuppositions in my project that your compressor on PyTorch Hub does not cover as far as I understand it:
- I assume that the training data do not fit into memory so I cannot decompress the entire dataset at once.
- Because I cannot load the entire data into memory and shuffle them there, I need access to individual samples of the dataset (for random permutations) without touching the rest of the data (or as little as possible).
Basically, I would like to integrate lossyless
into a subclass of PyTorch's Dataset
that implements the __getitem__(index)
interface. Before I start experimenting on my own and potentially overlook something that you already thought about, I wanted to ask you if you already considered approaches how to integrate your idea into a PyTorch Dataset
.
Looking forward to a discussion!
Hey Lennart,
The compression
function was simply meant to show how to use the model. As you can see in the code it's extremely simple to compress a single batch at a time (see https://github.com/YannDubs/lossyless/blob/6b604dd8b8d00b18cac8eef1f34006854a35f63c/hub/compressor.py#L187 ), and decompressing a single batch is also very simple (see https://github.com/YannDubs/lossyless/blob/6b604dd8b8d00b18cac8eef1f34006854a35f63c/hub/compressor.py#L238 ).
The only changes I see that should be made are:
1/ using batch size of 1 when compressing (that will make it slower but is a simple way to ensure that you can perform permutations). I.e., here: https://github.com/YannDubs/lossyless/blob/6b604dd8b8d00b18cac8eef1f34006854a35f63c/hub/compressor.py#L155
2/ saving (and loading) each compressed image separately rather than all at once. I.e. the following lines should go in the for loop: https://github.com/YannDubs/lossyless/blob/6b604dd8b8d00b18cac8eef1f34006854a35f63c/hub/compressor.py#L191
Neither of those points are very complex if you want to give it a try.
I'm quite busy right now but if you haven't done it by then I might be able to do it in the next 2-3 weeks :)
Hi Yann,
Thank you for your reply and pointing out the relevant LOCs. I'll see how I can best integrate lossyless
into my code.
I already tried something similar with a few models from compressai
and the execution time of my data loader was unfortunately rather subpar given that the decoder NN can only process a single image or one mini-batch of images at max in parallel. Would you say it is generally possible to achieve the same decompression speed as "classic" codecs like JPEG with neural decoders? Even if I decode an entire dataset at once, the execution time of the neural codecs I've tried so far still lags behind classic codecs.
Yes unfortunately removing batch compression will make the compressor very slow. But actually it’s not really needed, you can compress in batch and still save images separately.
Concerning the decompressor right now compressai doesn’t allow batch decompression so that’s why decompressing the entire dataset at once is so slow, I.e, it doesn’t actually take advantage of batches. I don’t think that this is an issue with lossyless though but simply the compressai implementation. In theory lossyless could even be quicker at decompression than standard codes as it doesn’t require reconstructing the image.
The simplest way to make the decoder quicker for now is to at least parallelise decompression of each image.
I see. Maybe converting a lossyless model into a TensorRT engine could also help improve decompression speed. I haven't worked with TensorRT before though so I'm not sure if there is a straightforward way to integrate a TensorRT engine into another training pipeline.
I'll be quite busy throughout the next week but afterwards I'll see if there is anything I can do to achieve decent/better decompression speed without having to mess with the entire underlying software stack.
Hi Yann,
In the last days, I had a chance to work on this topic again and experimented with a few ways to achieve good encoding/decoding performance. I couldn't test my prototypes on server-grade hardware yet but I'd like to share some of the insights I got from running tests on my laptop today. I ran my tests on a small ImageNet subsample with 900 images and 2 MB disk size.
Encoding the dataset with lossyless (beta 0.1) takes about 40 seconds with my MX330 laptop GPU. Using WebP and Python multiprocessing, my CPU processes the same dataset in less than a second. Do these numbers sound reasonable to you? I can post a snippet of my code if you think that lossyless encoding should not be that far off WebP.
Decoding the dataset using a PyTorch data loader with multiple workers takes about 25s on my laptop. WebP decoding takes less than 0.1s on the other hand. One problem with the PyTorch data loader (afaik) is that the separate worker processes do not use multi-threading due to the Python GIL. That makes executing the model rather slow. Therefore, I also implemented a prototype that uses Python's multiprocessing library instead of a PyTorch data loader. In this scenario, decoding takes about 4-9 seconds. However, it came as a surprise to me that basically the entire time is spent on loading the model (4-8s) and decoding itself is pretty quick (but still slower than WebP). Loading other models from CompressAI takes less than 0.1s though. Does it makes sense that loading a lossyless model takes so much longer than other learned compression models?