MagicSource
MagicSource
Hi, still wanna ask 2 question. 1. the dataset shows all ocnvert tables to markdown, how about formula, normal markdown articels? 2. the conversion used span to wrap markdown, will...
@xhyandwyy X-P 2023年代问题还没结局恩饿
I notcied that you enlarge the size in llava-1.5 are using interpolate positional embedding after calculate position_ids. This would notiablly drop performance as model haven't seen large sizes when training....
Nice, let me know the differences between them after you tried.
@luogen1996 Hello, I am doing sft stage2 follow your code, using zero3 finetune, got some warnings: ``` - vision_model.head.layernorm.bias: found shape torch.Size([1152]) in the checkpoint and torch.Size([0]) in the model...
Hi, thanks for the reply. This is an interesting conclusion. How does Mantis added the seperator, for instance, if we have 3 frames single sample? I just wondering llava's Image1Image2Image3...
@jdf-prog Hi, Mantis-Idefics2 used Navit and maxium 980 input resolution, Does the test perviod also resize into maxium 980 or just original size input? Have u also conducted the slicing...
Hi,still have some question wanna to dicuss. You mentioned the training used Idefics2 as pretrained model, then there would be a very serious question and could possiable the biggest concern...
@jdf-prog Hi, thanks for your deep insights. Looks like built upon some strong baseline for continue training can also be very benifitial. I think the next step, you might will...
Hello, I have to say, VideoChat2 now got MVBench 60 point....