Brian Qu issues

Results 10 issues of


                                            Brian Qu

token_id is not matching with the config(InstructBLIP-Vicuna-7b-v1.1)

In Vicuna-7b-v1.1's config.json, there is : ``` "bos_token_id": 0, "eos_token_id": 1, "pad_token_id": -1, ``` In its generation_config.json, there is: ``` "bos_token_id": 1, "eos_token_id": 2, "pad_token_id": 0, ``` But actually, this...

Is it a writting Error about bos_token in instuctblip?

Hello, thanks to your great work! In `blip2_vicuna_instruct.py`, the `bos_token` of LLM has been changed. Originally, it is '< s >' with idx:1. But after the following code: ``` self.llm_tokenizer.add_special_tokens({'pad_token':...

Add online-loading Kinetics-400 Dataset to train TANet with lower disk cost caused by frames storage

Since there are too many video frames in Kinetics-400 Dataset, it will lead to large disk cost if we extract all the frames in the ~300k videos. So I propose...

How to determine a good value of k

Hello, I have read your paper and thought that it is a really good work. But I have a question. How can I determine the value of k? Is there...

[BUG] Zero3: Gather the params for inference(huggingface_language_model.generate) in the end of 1 epoch and re-partition it for next epoch training

**Describe the bug** Hi, I use zero-3 for MLLM training. After one-epoch training stage, I want to evaluate this model(using model.generate()). However, params of the model are located on multi-gpu,...

bug

training

About the Visual Prompt

Hello,thanks for your great work. But I have some questions about the visual prompts especially the modificaitons on timm. Firstly, I find that you have annotated the code below: So,is...

ZERO3 + Offload CPU Error when fine-tuning InternLM-XComposer2

Hi, Thanks for your great work! When I fine-tune InternLM-XComposer2(unfreeze the proj and the whole LLM, freeze vit). In order to avoid OOM, I use zero3 and offload the optimizer...

[BUG] Pipeline Dataloader Sampler: `shuffle=False`

**Describe the bug** When I read the source code of building the `dataloader` in `PipelineEngine`. I find `shuffle=False` in the sampler. Code: ``` sampler = torch.utils.data.distributed.DistributedSampler(dataset, num_replicas=self.dp_world_size, rank=self.mpu.get_data_parallel_rank(), shuffle=False) ```...

bug

training

Add `ChartMoE` into `ChartQA` and `Chart to Code`

ChartMoE is a multimodal large language model with Mixture-of-Expert connector for advanced chart 1)understanding, 2)replot, 3)editing, 4)highlighting and 5)transformation. We've released codes on [https://github.com/IDEA-FinAI/ChartMoE](https://github.com/IDEA-FinAI/ChartMoE) and the huggingface model on [https://huggingface.co/IDEA-FinAI/chartmoe](https://huggingface.co/IDEA-FinAI/chartmoe)....

The Doc/ChartQA performance of LLaVA baseline

Hi, thanks for your great work! As shown in your teaser, the Doc/ChartQA performance of LLaVA baseline is 45.1/41.8 correspondingly. How did you get the score? I wanted to reproduce...