Yuxuan Wang

Results 39 comments of Yuxuan Wang

Hi, I didn't find `image/caption/minigpt4` from M3IT, how can I obtain these images?

For SAM2, it depends on the length of the FIFO queue. For Grounded-SAM2, you can preserve the text of the old object as the input for GroundingDINO. Unfortunately, this code...

Apologies for the late reply. I believe it is possible as long as the object remains within the FIFO queue.

I haven't implemented multiple references in your case due to my limited bandwidth. However, I believe it could be achieved by modifying the `add_new_points_or_box` function. Feel free to submit a...

This is an important problem, and I'd like to give you an exact answer. However, I don't have much time at the moment. Could you try to tackle it and...

Thank you for your interest. I will provide updates if time permits and will inform you as soon as there is any progress.

Sorry for the late reply. Please forgive me, as I didn't receive any notification. The evaluation data are from VideoChatGPT and Video-LLaVA, which can be obtained directly from this [link](https://github.com/PKU-YuanGroup/Video-LLaVA/blob/main/TRAIN_AND_VALIDATE.md#data-for-validating).

What to do when the generated token index is not included in the vocabulary?

Thank you for your interest. I have provided the installation details for all the baselines in this file: [INSTALLATION.md](https://github.com/patrick-tssn/VideoHallucer/blob/main/INSTALLATION.md). I hope you find it helpful.

Thank you for your interest, this new function is directly adapted from https://github.com/Gy920/segment-anything-2-real-time, I believe the change is extended for adding new functions for real-time tracking.