EnYu
EnYu
Thanks a lot. I also have another problem. There will be CUDA out of memory when I inference ytvos, however I can train the model in ytvos normally. Is there...
Hi, Merlin is the pretrained weights and Merlin-Chat is the weights after SFT. The entire training process is conducted on 64 NVIDIA A800 GPUs, with approximately 12 hours required for...
We apologize for the time constraints; we have not yet organized the code to support multi-round, multi-frame video demos. However, at this stage, we support single-round dialogues, and you can...
Hi, Thank you for your attention to our work. We will open-source the Merlin-Chat SFT data after ECCV. Stay tuned for further updates.
Thanks for your attention. Yeah, you need to download the whole files of vicuna-v15. CUDA version is cuda 11.8.
Thanks for your attention. There is no original clip-vit-large-patch14-448 on the hugging face. We employed a positional embedding interpolation to adapt the original 224x clip-vit to support an input resolution...