Zhe Chen comments

Results 316 comments of


                                            Zhe Chen

Llama-3-8B compatibility

Hello, our InternViT-6B can be used together with Llama-3-8B, although this size ratio might not be the most reasonable. We recommend pairing the InternViT-6B with a 20B or 30B language...

Possible to use TIN in Aesthetic UST method?

Hi, thanks for your attention. I recommend you find whether there is IN or IW in the Aesthetic-aware Style-Attention (AesSA), and then replace them with TIN or TIW.

FAST 的CCL的实现

你好@lrjj, FAST的模型以及代码将在近期开源，链接如下～ https://github.com/czczup/FAST

Hi, thanks for this question and apologize for the delayed response. Regarding the performance degradation observed in multi-task training, several factors could contribute to this result. First, we only used...

12M finetune data for v1.2 plus

We are no longer using this data because we found that having more data is not always better; the quality of the data is more important. We discovered that reducing...

InternVL-Chat-V1.2的finetune耗时多久？

您好，感谢您的关注。finetune 1.2现在看性价比没有很高了，因为那个模型有点太大了（40B），我这两天会准备一下1.5版本的finetune。

InternVL-Chat-V1-5 Batch Inference

I will strive to resolve this issue by April 27th.

InternVL-Chat-V1-5 Batch Inference

Hi, batch inference is now supported. Here is an example: ```python import json import os from transformers import AutoTokenizer, AutoModel from tqdm import tqdm import torch import torchvision.transforms as T...

internvl-chatv1.2-plus 多张图片如何传给模型

您好，感谢关注，请问您是想要多图问答还是batch inference呢

internvl-chatv1.2-plus 多张图片如何传给模型

1.5支持多图问答，格式见这里的readme: https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5 大致是这样： ```python # multi-round multi-image conversation pixel_values1 = load_image('./examples/image1.jpg', max_num=6).to(torch.bfloat16).cuda() pixel_values2 = load_image('./examples/image2.jpg', max_num=6).to(torch.bfloat16).cuda() pixel_values = torch.cat((pixel_values1, pixel_values2), dim=0) question = "详细描述这两张图片" # Describe the two pictures in...