Jaemin Cho

Results 66 comments of Jaemin Cho

1. Prepare your data (I extracted features from videos using ResNet) 2. Set your configuration (directory of your data) at `configs.py` 3. Run `train.py`

I only tried 360dataset for experiment, but you can train/evaluate any dataset by replacing `dataset_dir` and `video_list` at `configs.py`.

Have you edited any of the existing codebase? `ref_text_feat` should have size of (B=batch size, K=number of references (usually 5 for COCO), dim=512 for ViT/B-32)

I see. Could you please share the full error log trace? I could not reproduce the error.

Thanks! I will merge PR if you make one :)

Oh, I didn't know there's `Get Spawn Coordinates`. This is helpful. Thanks Winson! Take care :)

[param.grad=None](https://github.com/j-min/VL-T5/blob/18699e2b1d5b4559f76c88a48cdec7176d356c34/VL-T5/src/caption.py#L227) replaces model.zero_grad() here.

Hi, in case you used `args.vis_pointer=True` in your experiment, would you please try with `args.vis_pointer=False`, that I made default recently [here](https://github.com/j-min/VL-T5/commit/a07d779aac8134a587a9e369b6e90f31ef1e6865)? I could reproduce the results in the paper with...

Hi @shrutijpalaskar. Since I had to run all pretraining/finetuning experiments on a 4 x 10GB RTX 2080 ti server (much smaller compared to recent works from big companies), I couldn't...

1. Feature extraction You can refer to [these lines](https://github.com/j-min/VL-T5/blob/main/feature_extraction/detectron2_proposal_maxnms.py#L202-L220), which extracts VG-trained faster r-cnn features from images based on [this repo](https://github.com/airsplay/py-bottom-up-attention). 2. datum/batch creation You can refer `__getitem__ `and `collate_fn`...