yaolinli

Results 11 comments of


                                            yaolinli

Cannot completely download the coco caption dataset for finetuning VinVL model

> @yaolinli You can try getting dataset from this link: https://biglmdiag.blob.core.windows.net/oscar/datasets/coco_caption.zip If you cannot download it by using azcopy, try using !wget command in Google Colab. However, COCO misses some...

Cannot completely download the coco caption dataset for finetuning VinVL model

I think the dataset from link `https://biglmdiag.blob.core.windows.net/oscar/datasets/coco_caption.zip` may be different from the `https://biglmdiag.blob.core.windows.net/vinvl/datasets/coco_caption` . Because if I do inference of the released vinvl /coco_captioning_base_scst/checkpoint-15-66405 on the test set from the...

VinVL Image captioning raises AttributeError: 'NoneType' object has no attribute 'img_feature_dim'

> To reproduce: run the python scripts from VinVL_MODEL_ZOO.md under > > > Image Captioning on COCO > > Script to finetune for base model: > > > > 1....

How evaluate RefCOCO?

We reference the evaluation code from [Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models](https://github.com/TRI-ML/prismatic-vlms).

How evaluate RefCOCO?

Hi, the [lmm-evals](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/current_tasks.md) repo has supported refcoco evaluation.

the query to patch attention for linear projection

Hi, since the query token and the patch token have a one-to-one mapping, meaning the i/576 query token corresponds exactly to the i/576 patch token, we directly visualize the 576...

About the raw token lens

Hi, you can find the raw vision token length in two ways: 1) Print the output shape of the visual features from the ViT in an MLLM, which looks like...

R-GAE

Thank you for your interest in our DeCo work. Our current implementation of R-GAE primarily builds upon the code LLaVA v1.5 model (https://github.com/haotian-liu/LLaVA?tab=readme-ov-file) and the ICCV 2021 paper "Generic Attention-model...

R-GAE

Hi, I have released the R-GAE code.

R_GAE execution

I plan to clarify the R_GAE demo code around February 25. The R-GAE code initializes the matrix as an identity matrix, based on the intuition that each input token's relevance...

1
2
›