skepsun
skepsun
Hi, @RX28666 In SLE, after generating new pseudo labels in each stage, you need to train a new model (with exactly the same hyperparameters) from scratch. In addition, you can...
We have not fully compared two conditions. We prefer to train a new model because the current model has been trained with full training data and has the same knowledge...
找到一个缓解的方法,trl自带的,在ppoconfig里传入`optimize_cuda_cache=True`就行 另外train_ppo无法使用wandb(`report_to=wandb`没用),需要在`ppoconfig`里传入`log_with=training_args.report_to`,再进一步把`PPOPeftTrainer`的`self.step(...)`后面加一句`self.log_stats(stats, batch, rewards)`。不过wandb里有一个pannel会报错,不知道怎么搞。 Edit: 好吧,512的长度还是会溢出
lora or qlora is necessary for training a 70b model on a single machine. I'm very curious about a 70b openchat model.
@forrestsocool 有试过用这个repo提供的modeling_baichuan.py来sft吗,我的loss总是第一个step是3点几然后就一直是0,感觉是不是代码或者权重有点问题
It is better to determine whether to use two-page view by detecting the height-width ratio instead of detecting landscape mode. Because the two-page view is better than single-page even in...
最后一步的代码里lora_config需要加上target_modules,trl作者给出的设置是target_modules=["q_proj","k_proj"]。训练可以跑通,但是会出现kl散度为负数的情况: ``` ===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run python -m bitsandbytes and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues ================================================================================ bin /d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so /d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145:...
还是会报错,传入`optimizer`的可训练参数是空的,但是前面也对`input embedding`和`lm head`设置了`requires_grd=True`。很奇怪。 ``` [2023-07-20 13:02:58,344] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-20 13:02:58,345] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-20 13:02:58,702] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet...
使用最新的dev分支,报错如下: ``` ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /d2/data/chuxiong/collie/examples/further_pretrain_llama/expand_vocab.py:115 in │ │ │ │ 112 │ evaluators=[evaluator] │ │ 113 ) │ │ 114 │ │ ❱ 115...
You can try `CUDA_VISIBLE_DEVICES=1` to set a different GPU (and use --gpu 0 in your script): ``` CUDA_VISIBLE_DEVICES=1 bash scripts/xxxx.sh ``` It seems that DGL(pytorch) will always occupy ~300mb in...