Louis comments

Results 25 comments of


                                            Louis

关于deepspeed的限制突破的代码是哪部分

你好，优化过后的deepspeed代码会放出来吗？我看readme中推荐的deepspeed仍然是官方版本

how to run in v100 GPU

hi @sysusicily I'm fine-tuning with 6 V100 GPUs, and I have a question regarding the time. The fine-tuning process is extremely slow for me. I'm using fp16 and attn_impl: torch,...

how to run in v100 GPU

Thank you for your response. I have switched to the 8xA100 40G machine and it is running smoothly now.

Template should not be truncated

@dawnranger Why might system tokens appear after the query? Could you provide some examples?

Template should not be truncated

> > @dawnranger Why might system tokens appear after the query? Could you provide some examples? > > For example, in chatglm2 template, `\n\n答：` is after `{{query}}`；in baichuan template, ``...

Thank you for your great work. I finetune the 12b model on an 8-card A100 40G using fp16 and deepspeed stage=3, per-device-train-batch-size=2, and gradient_accumulation_steps=4. The processed dataset had 10,971,000 rows...

Training on A100

Thank you for your response. I am trying to fine-tune Dolly's knowledge on Chinese data, so I have collected a considerable amount of data. However, I am experiencing CUDA OUT...

Training on A100

Here's the issue at hand: with the same configuration and parameters, bf16 encounters OOM errors while fp16 does not. Considering that expanding the vocabulary will add additional untrained weights, I...

Training on A100

https://github.com/facebookresearch/llama/blob/main/FAQ.md#4 I think this applies to Dolly as well.

fine tuning mpt7b using local dataset

This question may sound a bit silly, but why is right padding used during training while left padding is chosen during inference?