yaolinli

Results 11 comments of yaolinli

> @yaolinli You can try getting dataset from this link: https://biglmdiag.blob.core.windows.net/oscar/datasets/coco_caption.zip If you cannot download it by using azcopy, try using !wget command in Google Colab. However, COCO misses some...

I think the dataset from link `https://biglmdiag.blob.core.windows.net/oscar/datasets/coco_caption.zip` may be different from the `https://biglmdiag.blob.core.windows.net/vinvl/datasets/coco_caption` . Because if I do inference of the released vinvl /coco_captioning_base_scst/checkpoint-15-66405 on the test set from the...

> To reproduce: run the python scripts from VinVL_MODEL_ZOO.md under > > > Image Captioning on COCO > > Script to finetune for base model: > > > > 1....

We reference the evaluation code from [Prismatic VLMs: Investigating the Design Space of Visually-Conditioned Language Models](https://github.com/TRI-ML/prismatic-vlms).

Hi, the [lmm-evals](https://github.com/EvolvingLMMs-Lab/lmms-eval/blob/main/docs/current_tasks.md) repo has supported refcoco evaluation.

Hi, since the query token and the patch token have a one-to-one mapping, meaning the i/576 query token corresponds exactly to the i/576 patch token, we directly visualize the 576...

Hi, you can find the raw vision token length in two ways: 1) Print the output shape of the visual features from the ViT in an MLLM, which looks like...

Thank you for your interest in our DeCo work. Our current implementation of R-GAE primarily builds upon the code LLaVA v1.5 model (https://github.com/haotian-liu/LLaVA?tab=readme-ov-file) and the ICCV 2021 paper "Generic Attention-model...

Hi, I have released the R-GAE code.

I plan to clarify the R_GAE demo code around February 25. The R-GAE code initializes the matrix as an identity matrix, based on the intuition that each input token's relevance...