Jonas Wu comments

Results 19 comments of


                                            Jonas Wu

cuda OOM

Hi, We run the code on the V100 with 32G memory. We find it needs around 24G generally, while for some videos containing a lot of objects, it will reach...

For the Transformer decoder, the decoder embedding is the pooled language feature, and the learnable queries are pos embedding. Please refer [here](https://github.com/wjn922/ReferFormer/blob/93c8ff5b14d35ab91a4894d0783d2964fd9072f7/models/deformable_transformer.py#L191).

CPU Memory increasing when training

Hi, I suppose you mean the GPU memory. We also use the GPUs with 32G memory, so we think there won't be a problem.

how do you get pretrained model

Hi, it is indeed the pretrained model is different joint training model. The pretrained models are only trained using Ref-COCO/+/g datasets in a image level (setting num_frmaes=1).

how do you get pretrained model

We do not use the pretrained model for joint trainnig. We do not adopt the balance sampling of RefCOCO/+/g and RefYTVOS, though their scales are different.

how do you get pretrained model

@zhenghao977 We use 32 V100 GPUs for the pretrained models. The total epoch is 12 and lr drops at the 8th and 10th epoch. The learning rate keeps the same...

youtube_vos: No test meta_expressions.json

Hi, The official website seems remove the test meta_expression recently. We upload the previous version of meta_expression [here](https://drive.google.com/file/d/1xjAwiPZColmGCKUYtMXO-Tc5Zzm1a-sJ/view?usp=sharing).

youtube_vos: No test meta_expressions.json

The inference needs around 24G memory. Since all the frames of a video will be used during inference while the training uses a clip of 5 frames, so it is...

About HMDB51 Dataset

Sorry for the very late reply. We upload the JHMDB dataset [here](https://drive.google.com/drive/folders/10EcgRQXQs-ZdBfDDuHLR-zcZo7f5hXbe?usp=sharing).

IndexError: index 1 is out of bounds for dimension 1 with size 1

I have the same problem. In some cases, the dim of prediction would become 1d instead of 2d. How to solve this problem?