RobustVideoMatting icon indicating copy to clipboard operation
RobustVideoMatting copied to clipboard

Can GRU be replaced with Conv layers?

Open Stephen-K1 opened this issue 2 years ago • 1 comments

  1. In the RVM model, the GRU layer accounts for a huge number of computations. It is intuitive to ask: would it be better to replace the GRU layer with Conv layer that occupies the same number of computations? A simple answer of 'yes' or 'no' will be greatly appreciated.

  2. Recently I've been trying my best to implement a matting model with excellent performance. I have read many recently proposed video matting papers and test their matting performance. Even RVM was proposed two years ago, it is the best open-sourced (including training code) model in my test results. I wonder if you can provide some tips to improve the performance of RVM? I believe you have a lot of good ideas that are worth trying. It will be greatly appreciated if you can share some of your insights here. Thank you very much!

Stephen-K1 avatar Nov 27 '23 07:11 Stephen-K1

  1. No. The whole point of our research is to replace conv with GRU. GRU recurrent architecture allows the model to analyze the video sequence with temporal memory. If you replace it with Conv, then it will treat each frame independently. It will have flickers.

  2. I have not been following matting research lately, but here are some ideas just top of my head:

  • Use transformer instead of conv gru to model temporal relation.
  • Use better backbone, based on ViT, like DinoV2.
  • Treat matting as a generative task, using diffusion objective etc.

PeterL1n avatar Dec 20 '23 21:12 PeterL1n