叶加博

[email protected]

Tongyi, Alibaba Group China

Results 25 comments of


                                            叶加博

对图像进行坐标检测，生成的bbox是resize成正方形之后的值吗？

Yes, the images are resized to squares for example 448x448. But, the generated coordinates should be a value in the range [0,1], which are ratios that are unrelated to the...

The Quick Start Code cannot be executed in mPLUG-Owl2

> > For the same snippet I got the following error: > > ``` > > --------------------------------------------------------------------------- > > RuntimeError Traceback (most recent call last) > > Cell In[8], line...

other downstream tasks available? Like Visual Reasoning, requires the model to predict whether a sentence describes a pair of images

Owl series support multiple images inputs. You can develop the downstream pipeline by passing a list of images and place the same number of "" in your prompt.

how to realize multi-image correlation in vqa task?

You can pass a list of images and place the same number of "" in your prompt.

how to realize multi-image correlation in vqa task?

> I pass a list of images, say 2 images, and modify the prompt. The image_tensor after preprocess has batch size of 2, while the input_ids has batch size of...

Computing output likelihoods with the model

> Hi, I tried a quick implementation to compute the output likelihoods of a given interleaved image-text token sequence: > > ```python > def get_class_log_likelihoods(image_path, classes, model, tokenizer, img_processor, device='cuda',...

Conflicting `torch` and `torchvision` versions

We updated the repository that can support instruction-tuning based on peft. And the peft requires pytorch>=1.13.1. You can use ```conda install pytorch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 pytorch-cuda=11.7 -c pytorch -c nvidia``` to...

How to do the training on multiple images or image pair data?

> 我和他遇到了一样的问题，在你们的数据集上，loss不为nan，在自己的数据集上，loss为nan There is a high possibility that the issue is caused by the prompt being too long and the part complement being cut off during preprocessing. As a result,...

Download link of ViT-L-14.tar

I just fixed the link, try again ^^

Can not obtain the lm_head.weight

Are you using the zero-3 strategy to initialize the model? If so, the parameters may be offloaded.

1
2
3
›