Do something different.
Hon-Wong
[ECCV 2024] Elysium: Exploring Object-level Perception in Videos via MLLM
[Fully open] [Encoder-free MLLM] Vision as LoRA