Qiming Bao

Results 30 issues of Qiming Bao

Hi, Is there any way to increase the model_max_length but not increase the GPU memory too much? I have reduced the batch size to `1` and increased the gradient_accumulation_steps to...

Hi, for anyone who interested in the implementation of LayoutLMv3. Transfomers have updated the code for mask image modeling and the code is based on DEIT. You can inherit the...

### Describe the bug Hi, I got the following error when my PR is checked. Here is the [link](https://github.com/openai/evals/pull/651) for my PR request. Does anyone know what is happening here?...

bug

你好, 我现在在训练PPO的时候出现了CUDA out of memory的问题,我是用了8个A100 GPUs,每一个GPU有80GB显存。下面是我运行的命令。我是用的stanford-alpaca提供的代码用8个A100全参微调训练了llama2-13B的sft model,reward模型是用的LLM-tuning项目提供的训练reward的代码基于llama2-13B训练的。现在就是在运行下面的ppo的时候出现了爆显存的问题,请问有什么办法可以降低显存吗?谢谢 ``` CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7 python rl_training.py \ --base_model_name /data/qbao775/Explanation-Generation/llama-2/llama-2-13B \ --merged_sft_model_path /data/qbao775/Explanation-Generation/llama_2_13B_merged_all_generator_avg_3_lenexp_10 \ --sft_model_lora_path /data/qbao775/Explanation-Generation/llama_2_13B_merged_all_generator_avg_3_lenexp_10 \ --reward_model_lora_path ../weights/llama-2-13B_beyond_reward_chinese_5000_peft_last_checkpoint \ --adafactor False \ --save_freq 10 \...

Hi Jie, Here is our new papers for logical reasoning data augmentation, prompt augmentation and evaluation. Please consider to add those papers into your arXiv paper. Thanks a lot. ###...

Hi, I meet a issue when I run the project. The detail is the following message. Does anyone know hoe to solve the issue? Thank you so much. ``` Python...

Hi, I am wandering whether the project has public the code or tutorial about ELMO+ESIM? Thank you so much.

Hi, I found the link of the PyTorch version replication and tutorial link of the "Deep contextualized word representations" is missing. Can you update the link? Thank you so much....

想问一下博主知道关于ELMO+ESIM的项目吗?多谢

Hi, I find the project is very interesting. Can I ask which Flash Attention has been used in this project? From the official flash attention project, they have provided flash...