Jaemin Cho comments

Results 66 comments of


                                            Jaemin Cho

Could you tell me how to test this code?

1. Prepare your data (I extracted features from videos using ResNet) 2. Set your configuration (directory of your data) at `configs.py` 3. Run `train.py`

How to evaluate the result on TVSUM or SUMME dataset.

I only tried 360dataset for experiment, but you can train/evaluate any dataset by replacing `dataset_dir` and `video_list` at `configs.py`.

Phase 1 validation throws: shape '[4, -1, 512]' is invalid for input of size 64000

Have you edited any of the existing codebase? `ref_text_feat` should have size of (B=batch size, K=number of references (usually 5 for COCO), dim=512 for ViT/B-32)

Phase 1 validation throws: shape '[4, -1, 512]' is invalid for input of size 64000

I see. Could you please share the full error log trace? I could not reproduce the error.

Use gpu for random number generation

Thanks! I will merge PR if you make one :)

Place objects randomly in locations visible from the current camera view

Oh, I didn't know there's `Get Spawn Coordinates`. This is helpful. Thanks Winson! Take care :)

No model.zero_grad()

[param.grad=None](https://github.com/j-min/VL-T5/blob/18699e2b1d5b4559f76c88a48cdec7176d356c34/VL-T5/src/caption.py#L227) replaces model.zero_grad() here.

Hi, in case you used `args.vis_pointer=True` in your experiment, would you please try with `args.vis_pointer=False`, that I made default recently [here](https://github.com/j-min/VL-T5/commit/a07d779aac8134a587a9e369b6e90f31ef1e6865)? I could reproduce the results in the paper with...

Hyperparameter Tuning Strategies

Hi @shrutijpalaskar. Since I had to run all pretraining/finetuning experiments on a 4 x 10GB RTX 2080 ti server (much smaller compared to recent works from big companies), I couldn't...

Inference on my own data?

1. Feature extraction You can refer to [these lines](https://github.com/j-min/VL-T5/blob/main/feature_extraction/detectron2_proposal_maxnms.py#L202-L220), which extracts VG-trained faster r-cnn features from images based on [this repo](https://github.com/airsplay/py-bottom-up-attention). 2. datum/batch creation You can refer `__getitem__ `and `collate_fn`...