萧停云 comments

Results 9 comments of


                                            萧停云

RuntimeError: Error building extension 'lightseq_layers_new'

> 使用的是发行版本3.0.0

RuntimeError: Error building extension 'lightseq_layers_new'

> > 3.0.0 may require cuda 11, you could checkout 2.2.1 version and try. > > I try cuda11 and pytorch 1.11.0 , python lightseq/examples/inference/python/export/huggingface/hf_bert_export.py (tag 3.0.1), the problem still...

最后一个batch的数据处理卡住

使用run_finetune_with_lora.sh时单卡能够进行到模型训练阶段，但会报错。双卡则在数据处理阶段卡住。以下是单卡的日志，A6000,48G显存 [2023-04-09 06:06:04,824] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2023-04-09 06:06:04,839] [INFO] [runner.py:550:main] cmd = /data/anaconda3/envs/ljy_lmflow/bin/python3.9 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1...

最后一个batch的数据处理卡住

使用run_finetune.sh时，单卡训练过程报错，双卡模型加载阶段卡住以下是单卡日志 [2023-04-09 06:37:33,738] [WARNING] [runner.py:186:fetch_hostfile] Unable to find hostfile, will proceed with training with local resources only. [2023-04-09 06:37:33,754] [INFO] [runner.py:550:main] cmd = /data/anaconda3/envs/ljy_lmflow/bin/python3.9 -u -m deepspeed.launcher.launch --world_info=eyJsb2NhbGhvc3QiOiBbMF19 --master_addr=127.0.0.1...

最后一个batch的数据处理卡住

> Thanks for your interest in LMFlow! Could you please check `log/finetune/train.err` to see the detailed error message? Also, it would be nice if you could provide the hardware settings...

[BUG] pin_memory() raises an error when using pipeline parallelism

1. I also encountered the same problem. Your solution is effective for opt-1.3B, But when training gpt-3.5B, stuck in the loop for a long time. The larger the model, the...

[BUG] pin_memory() raises an error when using pipeline parallelism

@tjruwase I can run through the method of 00INDEX. However, But if I don't modify the source code, as long as offload is turned on, whether it is zero-2 or...

关于IDEA-CCNL/Wenzhong2.0-GPT2-3.5B-chinese数据集格式的疑惑

此外，预测时，需要在原始输入的前面加入bos_token吗

Which method of constructing the index

But when I use flatIP, on the dl19 dataset, the map of contriever is 26.6, ndcg@10 is 45.5, and recall@1k is 77.5. The map of contriever-msmarco is 45.2, ndcg@10 is...