Peter Lin

Results 40 comments of Peter Lin

The `inference_video.py` script is not optimized for real-time. You could use hardware decoding and encoding. Parallelize data transfer etc. But those likely require C++ so we are not doing that....

You can give it a try and let me know how it goes.

The input shape is correct if your image is 1920x1080 resolution. `pha` and `fgr` are the actual output. `pha_sm` is from the base network (You can refer to the architecture...

I see. The only place `squeeze()` is used is in `refiner.py`. You can change `ref.squeeze(1)` in `refiner.py` to `ref[:, 0, :, :]`. Then re-export ONNX. This should get rid of...

Unlike LSTM, GRU by design does not have separate hidden and forward output. They share the same. See this diagram. ![image](https://github.com/PeterL1n/RobustVideoMatting/assets/7651753/9bd6ddc0-b664-4946-87bf-5965cf5329c9) The `(1 - z)` was opposite to the paper...

Ah I figured it out. Here is the problem: https://github.com/MCG-NKU/CVPR_Template/blob/15c2d748f9da8c2aed1913acb15a9f104f3f5184/cvpr.cls#L435 You gotta change ``` \setlength{\oddsidemargin}{-0.304in} \setlength{\evensidemargin}{-0.304in} ``` to ``` \setlength{\oddsidemargin}{-0.1875in} \setlength{\evensidemargin}{-0.1875in} ``` The old margin value was designed for `a4paper`,...

1. No. The whole point of our research is to replace conv with GRU. GRU recurrent architecture allows the model to analyze the video sequence with temporal memory. If you...

I thought masking is supported through `flash_attn_varlen_func` https://github.com/Dao-AILab/flash-attention/blob/d30f2e1cd50185c98ed88c0684b4a603f15bee37/flash_attn/flash_attn_interface.py#L454C21-L454C21