Zhe Chen
Zhe Chen
Hello, our InternViT-6B can be used together with Llama-3-8B, although this size ratio might not be the most reasonable. We recommend pairing the InternViT-6B with a 20B or 30B language...
Hi, thanks for your attention. I recommend you find whether there is IN or IW in the Aesthetic-aware Style-Attention (AesSA), and then replace them with TIN or TIW.
你好@lrjj, FAST的模型以及代码将在近期开源,链接如下~ https://github.com/czczup/FAST
Hi, thanks for this question and apologize for the delayed response. Regarding the performance degradation observed in multi-task training, several factors could contribute to this result. First, we only used...
We are no longer using this data because we found that having more data is not always better; the quality of the data is more important. We discovered that reducing...
您好,感谢您的关注。finetune 1.2现在看性价比没有很高了,因为那个模型有点太大了(40B),我这两天会准备一下1.5版本的finetune。
I will strive to resolve this issue by April 27th.
Hi, batch inference is now supported. Here is an example: ```python import json import os from transformers import AutoTokenizer, AutoModel from tqdm import tqdm import torch import torchvision.transforms as T...
您好,感谢关注,请问您是想要多图问答还是batch inference呢
1.5支持多图问答,格式见这里的readme: https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 大致是这样: ```python # multi-round multi-image conversation pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda() pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda() pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0) question = "详细描述这两张图片" # Describe the two pictures in...