朱橹
朱橹
> Setting short-term memory length to 34, and long-term memory length to 512 can achieve 8k frames. The total frame number = (short-term memory length - 2 ) * (long-term...
> > https://github.com/LLaVA-VL/LLaVA-NeXT/blob/main/scripts/video/train/exp.yaml; > > 不知道能不能给一个具体的对应文件orz,我一个个对应过去有点image在onevision.yaml用了,有些在single_image.yaml用了,不知道有没有一个比较方便的对应方式找到1.1M使用的onevision数据集的图文~ > > 还有是不是有些没开源?所以目前来说是不能复现出llava-video的,就是从ov-si到llava-video....只能用自己数据微调 请问有解决吗?同样的疑问
> The position of `mm_newline_position` is to be inserted. > > 2. Grid: Inserted after every row (the same as image modality) > 3. One_token: Inserted after all frames. >...