Yifan Du

Results 27 issues of Yifan Du

It would be valuable to train our model based on the linear layer after the 1st training stage. Meanwhile, will the filtered cc3m data be released? Thanks a lot!

When encoding the image to prompt, you mentioned *captions* and *bounding boxes*, I wonder which object detection model you utilized to generate the bounding boxes?

When I run `pip intall flash-attn`, it raises an error: ```ERROR: Could not build wheels for flash-attn, which is required to install pyproject.toml-based projects``` However, I have run `pip install...

I download the cc_sbu dataset and count the number, I found that the total number is 12M and the success is more than 6M, which is impossible, since cc_sub+laion is...

Thanks for your awesome work in InstructBLIP. When I want to reproduce the result in Figure 5 in your paper, the result is not ideal. ``` raw_image = Image.open("../docs/_static/Confusing-Pictures.jpg").convert("RGB") question...

### 🚀 The feature, motivation and pitch Training large models with bf16 is necessary, and many vision models have the upsample_bicubic2d_out_frame operation. However, it does not support BFloat16. Making upsample_bicubic2d_out_frame...

triaged
module: bfloat16

您好,感谢您的工作!我在下载字体文件夹之后,并没有dictionary.json和pinyin.json文件,麻烦可以上传一份吗?

As a reminder, I find that the config of [eachadea/vicuna-7b-1.1](https://huggingface.co/eachadea/vicuna-7b-1.1/tree/main) and [lmsys-vicuna-7b-v1.1](https://huggingface.co/lmsys/vicuna-7b-v1.1) are different, i.e. they have different bos_token_id, eos_token_id, and pad_token_id, and only eachadea/vicuna-7b-1.1 can work well with instructBLIP....

Thanks for your awesome work! There is a small problem: when I fine-tune long_llama with gradient_checkpointing, it raises an error: ![image](https://github.com/CStanKonrad/long_llama/assets/55051961/ec56d425-d0bc-45f6-be34-b62501562795) Could you please update the code in transformers to...

Thanks for your awesome work! VisionLLM opens a way towards a generalist vision and language model. However, from the result in the single task vs. multiple tasks in ablation study,...