Jaemin Cho comments

Results 66 comments of


                                            Jaemin Cho

Inference on my own data?

As you can see in the [`__getitem__`](https://github.com/j-min/VL-T5/blob/main/VL-T5/src/vqa_data.py#L143-L173) of the Dataset class, `vis_feats` is 2048-dim feature from Faster R-CNN. `boxes` are the 4-point coordinates of bounding boxes. The Faster R-CNN features...

Inference on my own data?

I'm afraid if you didn't load the pretrained checkpoint properly. Please check out [`load_checkpoint`](https://github.com/j-min/VL-T5/blob/main/VL-T5/src/vqa.py#L91) used in vqa.py, defined in [`trainer_base.py`](https://github.com/j-min/VL-T5/blob/cafc314de831ec5c9fcf5b05e91d3f162712836d/VL-T5/src/trainer_base.py#L171).

Inference on my own data?

I created [a google colab](https://colab.research.google.com/github/j-min/VL-T5/blob/main/inference_example.ipynb) for custom image processing. Hope this helps.

Inference on my own data?

Yes, the py-bottom-up-attention repo is compatible with huggingface transformer LXMERT demo. VCR questions (https://visualcommonsense.com/explore/?im=2519) have a different format than VQA, for example, person grounding / multiple-choice. So I don't think...

Inference on my own data?

* `Looking at the VL-T5 paper, it seems like the decoder generates text in an autoregressive manner i.e. it predicts the probability of future text tokens (among all the tokens...

Inference on my own data?

`scores` are from [VQA evaluation](https://visualqa.org/evaluation.html). Many VQA methods train models by directly regressing the soft scores (ex. [lxmert](https://github.com/airsplay/lxmert/blob/master/src/tasks/vqa_model.py)). But in our text-generation based method, I just used [score = 1...

Jaemin Cho

Inference on my own data?

Inference on my own data?

Inference on my own data?

Inference on my own data?

Inference on my own data?

Inference on my own data?

Inference on my own data?

Inference on my own data?

a bug of VLT5TokenizerFast

Captioning off-the-shelf