skepsun comments

Results 10 comments of


                                            skepsun

SLE

Hi, @RX28666 In SLE, after generating new pseudo labels in each stage, you need to train a new model (with exactly the same hyperparameters) from scratch. In addition, you can...

SLE

We have not fully compared two conditions. We prefer to train a new model because the current model has been trained with full training data and has the same knowledge...

找到一个缓解的方法，trl自带的，在ppoconfig里传入`optimize_cuda_cache=True`就行另外train_ppo无法使用wandb（`report_to=wandb`没用），需要在`ppoconfig`里传入`log_with=training_args.report_to`，再进一步把`PPOPeftTrainer`的`self.step(...)`后面加一句`self.log_stats(stats, batch, rewards)`。不过wandb里有一个pannel会报错，不知道怎么搞。 Edit: 好吧，512的长度还是会溢出

Does openchat support LORA finetune?

lora or qlora is necessary for training a 70b model on a single machine. I'm very curious about a 70b openchat model.

位置插值扩展context长度到8k或者32k

@forrestsocool 有试过用这个repo提供的modeling_baichuan.py来sft吗，我的loss总是第一个step是3点几然后就一直是0，感觉是不是代码或者权重有点问题

Double Page Mode

It is better to determine whether to use two-page view by detecting the height-width ratio instead of detecting landscape mode. Because the two-page view is better than single-page even in...

跑最后一步报这个警告，要怎么改超参数呢

最后一步的代码里lora_config需要加上target_modules，trl作者给出的设置是target_modules=["q_proj","k_proj"]。训练可以跑通，但是会出现kl散度为负数的情况： ``` ===================================BUG REPORT=================================== Welcome to bitsandbytes. For bug reports, please run python -m bitsandbytes and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues ================================================================================ bin /d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/libbitsandbytes_cuda117.so /d1/data/chuxiong/miniconda3/envs/llm/lib/python3.9/site-packages/bitsandbytes/cuda_setup/main.py:145:...

llama-2-7b拓展词表报错

还是会报错，传入`optimizer`的可训练参数是空的，但是前面也对`input embedding`和`lm head`设置了`requires_grd=True`。很奇怪。 ``` [2023-07-20 13:02:58,344] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet implemented [2023-07-20 13:02:58,345] [INFO] [comm.py:594:init_distributed] cdb=None [2023-07-20 13:02:58,702] [WARNING] [comm.py:152:init_deepspeed_backend] NCCL backend in DeepSpeed not yet...

llama-2-7b拓展词表报错

使用最新的dev分支，报错如下： ``` ╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮ │ /d2/data/chuxiong/collie/examples/further_pretrain_llama/expand_vocab.py:115 in │ │ │ │ 112 │ evaluators=[evaluator] │ │ 113 ) │ │ 114 │ │ ❱ 115...

How to use gpu 1 or 2 correctly (Leak of data to gpu 0)

You can try `CUDA_VISIBLE_DEVICES=1` to set a different GPU (and use --gpu 0 in your script): ``` CUDA_VISIBLE_DEVICES=1 bash scripts/xxxx.sh ``` It seems that DGL(pytorch) will always occupy ~300mb in...