mmagic
mmagic copied to clipboard
BasicVSR++: reproduce ntire decompression results on track3
Hi, Thanks for your great work. When I use BasicVSR++ to reproduce ntire decompression results on track3 with the trained model you provided and official testset. the settings include: basicvsr_plusplus_c128n25_600k_ntire_decompress_track3.py and basicvsr_plusplus_c128n25_ntire_decompress_track3_20210304_6daf4a40.pth.
After testing, the Eval-PSNR is 30.0519, and the Eval-lq_PSNR is 28.3367. So I just gain 1.71dB improvements on track3. I set the num_input_frame as length of each sequence to use the full video sequence as inputs when I test.
Can you give some advice?
We also used ensemble to further boost the performance. I am going to implement the ensemble in the following days (or weeks).
Out of curiosity, can someone explain to me, what "ensemble" in this context means?
Out of curiosity, can someone explain to me, what "ensemble" in this context means?
Here ensemble means flipping and rotating the images spatially. After rotating and flipping, you should have 8 copies of the original sequence. Then we do inference 8 times and take the average of the outputs.
Ah interesting, so 4 copies per 90 degree rotation, unmirrored+mirrored. Is the average a simple per pixel calculation or a more complex calculation?
Currently working on a heavily compressed lowres clip and my current steps for a quite good result go as: Bicubic upscale x2, run track3 model, Bicubic downscale to original, run track 3 model followed by an upscale using Vimeo-90K (BI) model. Though, I need to retest the last step with the BD model and replace the Bicubic downscale with a gausian blur + downscale.
Anyway, the ensemble route looks interesting. Looking at the compute time for my 1080ti as is....oh it's going to cry^^"
Ah interesting, so 4 copies per 90 degree rotation, unmirrored+mirrored. Is the average a simple per pixel calculation or a more complex calculation?
Currently working on a heavily compressed lowres clip and my current steps for a quite good result go as: Bicubic upscale x2, run track3 model, Bicubic downscale to original, run track 3 model followed by an upscale using Vimeo-90K (BI) model. Though, I need to retest the last step with the BD model and replace the Bicubic downscale with a gausian blur + downscale.
Anyway, the ensemble route looks interesting. Looking at the compute time for my 1080ti as is....oh it's going to cry^^"
That is quite a lot of steps. I think there could be some better ways to go, but that would require more explorations.
That is quite a lot of steps. I think there could be some better ways to go, but that would require more explorations.
Yup, but the ensemble way doesn't sound like less steps^^ The videos I deal with are from a game using Cinepak codec. In heavily dynamic scenes it gets really blocky. But with those steps above I got them quite smooth.
Currently experimenting with an ai based deblock prepass. Seemed to improve it even more, but seemed to have "smoothed" some things out a bit. Something I need to explore more.
EDIT: And now I'm thinking if it's worth training my own model with the Vimeo 90K dataset but using cinepak as degradationprocess to provide the lr images oO. Though, my assumption is this is going to take ages on my 1080ti, if the memory is even enough
That is quite a lot of steps. I think there could be some better ways to go, but that would require more explorations.
Yup, but the ensemble way doesn't sound like less steps^^ The videos I deal with are from a game using Cinepak codec. In heavily dynamic scenes it gets really blocky. But with those steps above I got them quite smooth.
Currently experimenting with an ai based deblock prepass. Seemed to improve it even more, but seemed to have "smoothed" some things out a bit. Something I need to explore more.
EDIT: And now I'm thinking if it's worth training my own model with the Vimeo 90K dataset but using cinepak as degradationprocess to provide the lr images oO. Though, my assumption is this is going to take ages on my 1080ti, if the memory is even enough
I think Vimeo-90K is not a very good dataset if you want to use recurrent networks, since it contains only 7 frames for each sequence.
Oh okay. Do you have any suggestions what is a better fit?
If you can construct the "low quality" videos by yourself, you can consider using the REDS dataset. It contains 100 or 500 high-quality frames per sequence, depending on which version you use.
If you can construct the "low quality" videos by yourself, you can consider using the REDS dataset. It contains 100 or 500 high-quality frames per sequence, depending on which version you use.
Yup, I can (and even need to) do it. Thanks :)
Thanks for your great work. Can you compare performance w/o ensemble? And any advice for scale low quality videos without retraining?
Thanks for your great work. Can you compare performance w/o ensemble? And any advice for scale low quality videos without retraining?
The model in MMEditing is currently without ensemble. The code for ensemble is still in PR, we can do a comparison afterwards.
About your second question, I am not quite sure what do you mean.
Thanks for your great work. Can you compare performance w/o ensemble? And any advice for scale low quality videos without retraining?
The model in MMEditing is currently without ensemble. The code for ensemble is still in PR, we can do a comparison afterwards.
About your second question, I am not quite sure what do you mean.
I mean the model is less effective for compressed video in reality practice than those low resolution but clear videos. Try different sets of down-than-up scaling may help but is there any blind ways to improve effectiveness?
Thanks for your great work. Can you compare performance w/o ensemble? And any advice for scale low quality videos without retraining?
The model in MMEditing is currently without ensemble. The code for ensemble is still in PR, we can do a comparison afterwards. About your second question, I am not quite sure what do you mean.
I mean the model is less effective for compressed video in reality practice than those low resolution but clear videos. Try different sets of down-than-up scaling may help but is there any blind ways to improve effectiveness?
I assume that is due to how this model was pretrained. Training your own version on specific compressionmethods may give better results for specific cases.
Hi, I'm reproducing your paper, so could you release the ensemble test code?
Hi, I'm reproducing your paper, so could you release the ensemble test code?
Hello, you can refer to #585. The PR will be merged after review.
Hi, I'm reproducing your paper, so could you release the ensemble test code?
Hello, you can refer to #585. The PR will be merged after review.
Thanks a lot. You do me a great favor.