Yifan Du issues

Results 27 issues of


Yifan Du

Any plan to release the cc3m filtered data and the weight of the linear layer after the 1st training stage?

It would be valuable to train our model based on the linear layer after the 1st training stage. Meanwhile, will the filtered cc3m data be released? Thanks a lot!

Question about the object detection

When encoding the image to prompt, you mentioned *captions* and *bounding boxes*, I wonder which object detection model you utilized to generate the bounding boxes?

Error when install flash-attn

When I run `pip intall flash-attn`, it raises an error: ```ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects``` However, I have run `pip install...

How to calculate the number of data in the cc_sbu and laion respectively?

I download the cc_sbu dataset and count the number, I found that the total number is 12M and the success is more than 6M, which is impossible, since cc_sub+laion is...

InstructBLIP generates short and repeated sentence.

Thanks for your awesome work in InstructBLIP. When I want to reproduce the result in Figure 5 in your paper, the result is not ideal. ``` raw_image = Image.open("../docs/_static/Confusing-Pictures.jpg").convert("RGB") question...

Make upsample_bicubic2d_out_frame support BFloat16

### 🚀 The feature, motivation and pitch Training large models with bf16 is necessary, and many vision models have the upsample_bicubic2d_out_frame operation. However, it does not support BFloat16. Making upsample_bicubic2d_out_frame...

triaged

module: bfloat16

缺少dictionary.json文件

您好，感谢您的工作！我在下载字体文件夹之后，并没有dictionary.json和pinyin.json文件，麻烦可以上传一份吗？

A reminder and question about the vicuna checkpoint

As a reminder, I find that the config of [eachadea/vicuna-7b-1.1](https://huggingface.co/eachadea/vicuna-7b-1.1/tree/main) and [lmsys-vicuna-7b-v1.1](https://huggingface.co/lmsys/vicuna-7b-v1.1) are different, i.e. they have different bos_token_id, eos_token_id, and pad_token_id, and only eachadea/vicuna-7b-1.1 can work well with instructBLIP....

Support for gradient_checkpointing

Thanks for your awesome work! There is a small problem: when I fine-tune long_llama with gradient_checkpointing, it raises an error: ![image](https://github.com/CStanKonrad/long_llama/assets/55051961/ec56d425-d0bc-45f6-be34-b62501562795) Could you please update the code in transformers to...

Question about the ablation

Thanks for your awesome work! VisionLLM opens a way towards a generalist vision and language model. However, from the result in the single task vs. multiple tasks in ablation study,...