Louis
Louis
你好,优化过后的deepspeed代码会放出来吗?我看readme中推荐的deepspeed仍然是官方版本
hi @sysusicily I'm fine-tuning with 6 V100 GPUs, and I have a question regarding the time. The fine-tuning process is extremely slow for me. I'm using fp16 and attn_impl: torch,...
Thank you for your response. I have switched to the 8xA100 40G machine and it is running smoothly now.
@dawnranger Why might system tokens appear after the query? Could you provide some examples?
> > @dawnranger Why might system tokens appear after the query? Could you provide some examples? > > For example, in chatglm2 template, `\n\n答:` is after `{{query}}`;in baichuan template, ``...
Thank you for your great work. I finetune the 12b model on an 8-card A100 40G using fp16 and deepspeed stage=3, per-device-train-batch-size=2, and gradient_accumulation_steps=4. The processed dataset had 10,971,000 rows...
Thank you for your response. I am trying to fine-tune Dolly's knowledge on Chinese data, so I have collected a considerable amount of data. However, I am experiencing CUDA OUT...
Here's the issue at hand: with the same configuration and parameters, bf16 encounters OOM errors while fp16 does not. Considering that expanding the vocabulary will add additional untrained weights, I...
https://github.com/facebookresearch/llama/blob/main/FAQ.md#4 I think this applies to Dolly as well.
This question may sound a bit silly, but why is right padding used during training while left padding is chosen during inference?