mmagic icon indicating copy to clipboard operation
mmagic copied to clipboard

BasicVSR++: reproduce ntire decompression results on track3

Open sxd0071 opened this issue 3 years ago • 17 comments

Hi, Thanks for your great work. When I use BasicVSR++ to reproduce ntire decompression results on track3 with the trained model you provided and official testset. the settings include: basicvsr_plusplus_c128n25_600k_ntire_decompress_track3.py and basicvsr_plusplus_c128n25_ntire_decompress_track3_20210304_6daf4a40.pth.

After testing, the Eval-PSNR is 30.0519, and the Eval-lq_PSNR is 28.3367. So I just gain 1.71dB improvements on track3. I set the num_input_frame as length of each sequence to use the full video sequence as inputs when I test.

Can you give some advice?

sxd0071 avatar Sep 27 '21 06:09 sxd0071

We also used ensemble to further boost the performance. I am going to implement the ensemble in the following days (or weeks).

ckkelvinchan avatar Sep 27 '21 06:09 ckkelvinchan

Out of curiosity, can someone explain to me, what "ensemble" in this context means?

Memnarch avatar Oct 06 '21 08:10 Memnarch

Out of curiosity, can someone explain to me, what "ensemble" in this context means?

Here ensemble means flipping and rotating the images spatially. After rotating and flipping, you should have 8 copies of the original sequence. Then we do inference 8 times and take the average of the outputs.

ckkelvinchan avatar Oct 06 '21 12:10 ckkelvinchan

Ah interesting, so 4 copies per 90 degree rotation, unmirrored+mirrored. Is the average a simple per pixel calculation or a more complex calculation?

Currently working on a heavily compressed lowres clip and my current steps for a quite good result go as: Bicubic upscale x2, run track3 model, Bicubic downscale to original, run track 3 model followed by an upscale using Vimeo-90K (BI) model. Though, I need to retest the last step with the BD model and replace the Bicubic downscale with a gausian blur + downscale.

Anyway, the ensemble route looks interesting. Looking at the compute time for my 1080ti as is....oh it's going to cry^^"

Memnarch avatar Oct 06 '21 13:10 Memnarch

Ah interesting, so 4 copies per 90 degree rotation, unmirrored+mirrored. Is the average a simple per pixel calculation or a more complex calculation?

Currently working on a heavily compressed lowres clip and my current steps for a quite good result go as: Bicubic upscale x2, run track3 model, Bicubic downscale to original, run track 3 model followed by an upscale using Vimeo-90K (BI) model. Though, I need to retest the last step with the BD model and replace the Bicubic downscale with a gausian blur + downscale.

Anyway, the ensemble route looks interesting. Looking at the compute time for my 1080ti as is....oh it's going to cry^^"

That is quite a lot of steps. I think there could be some better ways to go, but that would require more explorations.

ckkelvinchan avatar Oct 07 '21 01:10 ckkelvinchan

That is quite a lot of steps. I think there could be some better ways to go, but that would require more explorations.

Yup, but the ensemble way doesn't sound like less steps^^ The videos I deal with are from a game using Cinepak codec. In heavily dynamic scenes it gets really blocky. But with those steps above I got them quite smooth.

Currently experimenting with an ai based deblock prepass. Seemed to improve it even more, but seemed to have "smoothed" some things out a bit. Something I need to explore more.

EDIT: And now I'm thinking if it's worth training my own model with the Vimeo 90K dataset but using cinepak as degradationprocess to provide the lr images oO. Though, my assumption is this is going to take ages on my 1080ti, if the memory is even enough

Memnarch avatar Oct 07 '21 06:10 Memnarch

That is quite a lot of steps. I think there could be some better ways to go, but that would require more explorations.

Yup, but the ensemble way doesn't sound like less steps^^ The videos I deal with are from a game using Cinepak codec. In heavily dynamic scenes it gets really blocky. But with those steps above I got them quite smooth.

Currently experimenting with an ai based deblock prepass. Seemed to improve it even more, but seemed to have "smoothed" some things out a bit. Something I need to explore more.

EDIT: And now I'm thinking if it's worth training my own model with the Vimeo 90K dataset but using cinepak as degradationprocess to provide the lr images oO. Though, my assumption is this is going to take ages on my 1080ti, if the memory is even enough

I think Vimeo-90K is not a very good dataset if you want to use recurrent networks, since it contains only 7 frames for each sequence.

ckkelvinchan avatar Oct 27 '21 12:10 ckkelvinchan

Oh okay. Do you have any suggestions what is a better fit?

Memnarch avatar Oct 28 '21 16:10 Memnarch

If you can construct the "low quality" videos by yourself, you can consider using the REDS dataset. It contains 100 or 500 high-quality frames per sequence, depending on which version you use.

ckkelvinchan avatar Oct 29 '21 09:10 ckkelvinchan

If you can construct the "low quality" videos by yourself, you can consider using the REDS dataset. It contains 100 or 500 high-quality frames per sequence, depending on which version you use.

Yup, I can (and even need to) do it. Thanks :)

Memnarch avatar Oct 29 '21 09:10 Memnarch

Thanks for your great work. Can you compare performance w/o ensemble? And any advice for scale low quality videos without retraining?

lotress avatar Nov 25 '21 08:11 lotress

Thanks for your great work. Can you compare performance w/o ensemble? And any advice for scale low quality videos without retraining?

The model in MMEditing is currently without ensemble. The code for ensemble is still in PR, we can do a comparison afterwards.

About your second question, I am not quite sure what do you mean.

ckkelvinchan avatar Nov 25 '21 11:11 ckkelvinchan

Thanks for your great work. Can you compare performance w/o ensemble? And any advice for scale low quality videos without retraining?

The model in MMEditing is currently without ensemble. The code for ensemble is still in PR, we can do a comparison afterwards.

About your second question, I am not quite sure what do you mean.

I mean the model is less effective for compressed video in reality practice than those low resolution but clear videos. Try different sets of down-than-up scaling may help but is there any blind ways to improve effectiveness?

lotress avatar Nov 26 '21 02:11 lotress

Thanks for your great work. Can you compare performance w/o ensemble? And any advice for scale low quality videos without retraining?

The model in MMEditing is currently without ensemble. The code for ensemble is still in PR, we can do a comparison afterwards. About your second question, I am not quite sure what do you mean.

I mean the model is less effective for compressed video in reality practice than those low resolution but clear videos. Try different sets of down-than-up scaling may help but is there any blind ways to improve effectiveness?

I assume that is due to how this model was pretrained. Training your own version on specific compressionmethods may give better results for specific cases.

Memnarch avatar Nov 26 '21 06:11 Memnarch

Hi, I'm reproducing your paper, so could you release the ensemble test code?

ZcsrenlongZ avatar Feb 13 '22 12:02 ZcsrenlongZ

Hi, I'm reproducing your paper, so could you release the ensemble test code?

Hello, you can refer to #585. The PR will be merged after review.

ckkelvinchan avatar Feb 14 '22 01:02 ckkelvinchan

Hi, I'm reproducing your paper, so could you release the ensemble test code?

Hello, you can refer to #585. The PR will be merged after review.

Thanks a lot. You do me a great favor.

ZcsrenlongZ avatar Feb 14 '22 05:02 ZcsrenlongZ