Chunjiang Ge (葛春江)
Chunjiang Ge (葛春江)
However, Vicuna does not release the training data.
LLaMA 7b has 128 dim for each head, while flash attn support 64 dim for rtx3090 only. So llama 7b with flash attn may only run on a100 or h100.
> Are you sure? flash-attn v2 supports dim up to 256. I am able to use it on 3090 > > > FlashAttention-2 currently supports: > > > > 1....
> > > > 同学您好,感谢关注对话数据集,非常遗憾暂时没有公开对话数据集。如果研究者使用,需要在智源平台,具体可联系[[email protected]](mailto:[email protected]),将会给您详细使用说明。谢谢! 你好,给您发邮件了,请您回复一下
There is a long gap between the validation accuracy of the dataset of vlmevalkit and the model paper
Hello, I find that for TextVQA dataset, LLaVA evaluation with with reference token like: What kind of beer is this?\nReference OCR token: NINK, NK, BOWING, CC, STON, SUE, ED, Sublimely,...
Thanks for your relpy! I would like to know what is the normal format of inference with batch size > 1? Should we deploy the model though like, vllm or...
https://github.com/haotian-liu/LLaVA/issues/754#issuecomment-1907970439 this issue build a fast inference method for llava, would you add this function for every benchmark in this repo? BTW, I find sglang may not support lora+base model....
You could try to register it in the dassl package. This depends on dassl. What's more, I think recent dassl supports VLCS dataset. You could refer to https://github.com/KaiyangZhou/Dassl.pytorch.
Could you please elabrate on what kinds of text data and what kinds of task?
请问从原文来看是在pretrain阶段用了qwen的template进行训练的,是没有llava那种用plain template进行pretrain的阶段的是吧