bytes-lost

Results 5 comments of bytes-lost

@hiyouga 没有出现这个错误吗? ``` ./aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [51,0,0] Assertion `index >= -sizes[i] && index < sizes[i] && "index out of bounds"` failed. ../aten/src/ATen/native/cuda/IndexKernel.cu:92: operator(): block: [124,0,0], thread: [52,0,0] Assertion...

> @hiyouga ``` [INFO|trainer.py:622] 2023-06-15 17:12:03,926 >> Using cuda_amp half precision backend [INFO|trainer.py:1779] 2023-06-15 17:12:03,933 >> ***** Running training ***** [INFO|trainer.py:1780] 2023-06-15 17:12:03,934 >> Num examples = 48,329 [INFO|trainer.py:1781] 2023-06-15...

@hiyouga 我在train_sft.py这里加上了一行,但是还是一样的报错 ``` model, tokenizer = load_pretrained(model_args, finetuning_args, training_args.do_train, stage="sft") tokenizer.pad_token_id = 0 # 指定pad_token_id dataset = preprocess_data(dataset, tokenizer, data_args, training_args, stage="sft") ```

> @bytes-lost 看起来是 torch 的 checkpointing 过程出现了问题,可能和本地的 torch 以及 CUDA 环境有关,我这边测试了好几遍都没有问题。 好的,我重新创建环境测测看,torch=2.0.1版本是可以的吗?

> @Haijunlv 请问你们最终的预训练模型生成效果如何呢?loss在1.95的情况下