VSR-Transformer icon indicating copy to clipboard operation
VSR-Transformer copied to clipboard

There are some questions about the network that can't be solved. I really need your answer

Open swt199211 opened this issue 2 years ago • 0 comments

Hello, thank you for contributing this code and a method of applying transformer in the field of video. I have some problems with your paper and code. I'm just getting started, so I have more doubts

  1. I see that your optical flow distortion is estimated by using the input original image to distort the features, which is distorted five times, and the optical flow is not specially supervised. Why use optical flow in feedforward? Isn't self attention a good fusion of features, and isn't there an error in optical flow? Will it affect performance

  2. For patch and window size, you set patch to 8 in the project × 8. Is this patch the bigger the better or the smaller the better? If the patch is too large, there will be less local information. And whether it's better to set the window larger. You set the window to 64 mainly for the trade-off of calculation, right?

  3. As for the number of layers of transformer, is the more layers the better? Is 5 layers the best choice?

Some questions ask you mainly because the laboratory computing resources are limited and there are only two 3090 cards, so I can't verify them one by one because the training speed is too slow.

I look forward to your reply very much. Thank you very much

swt199211 avatar Mar 08 '22 13:03 swt199211