Yuxuan Wang
Yuxuan Wang
How to reproduce the behaviour --------- first thanks for developing such a wonderful tool, I am wondering is there any way to import bundles of images automatically? its hard to...
Hi, What is the episode stand for in your dataset paper? I can't find any instruction about this. Thank you a lot.
Thank you for your exceptional efforts. Despite this, I find myself at an impasse at the initial stage after several days of work. I have meticulously reviewed each dataset, yet...
Hi, The extracted features can be found here https://drive.google.com/drive/folders/14zlHmNFkCgptiGttwWKrsaaz5vVUFs00?usp=sharing _Originally posted by @hudaAlamri in https://github.com/batra-mlp-lab/avsd/issues/2#issuecomment-561653010_
To my knowledge, the videos in NExTQA dataset are relatively short, with an average video length of 44 seconds, and there is a noted static bias[1] in the ActivityNet QA...
Could you please provide a script or JSON file of the ID map from M3IT to VideoChat2IT? Matching different files can be quite challenging. For example, `coco llava minigpt4 paragraph_captioning...
For tokenizers in `transformers`, in convention, `tokenizer.vocab_size` [as documented](https://github.com/huggingface/transformers/blob/092f1fdaa4224fdd88c616dc9678e6fcb37bfffd/src/transformers/tokenization_utils.py#L378-L383) is the size of the base vocabulary (without the added tokens). To get the actual vocabulary size, you need to use...
Thank you for your excellent work; it is impressively fast. However, when I test it with a short 16kHz speech sample, the decoded voice sounds unclear. Is this a normal...
The StreamingBench results in your reports (https://openbmb.notion.site/MiniCPM-o-2-6-A-GPT-4o-Level-MLLM-for-Vision-Speech-and-Multimodal-Live-Streaming-on-Your-Phone-185ede1b7a558042b5d5e45e6b237da9) appear to be based on a 60-second setup. I would like to request the official results for a long-context setup, as well as...
Hi all, We’ve implemented the training code and added vision input support. You can now convert LVLM/LALM/LLM models to OmniLLM. Check out OpenOmniNexus here: https://github.com/OmniMMI/OpenOmniNexus