Firefly-LLaMA2-Chinese issues

没有Firefly-LLaMA2-7B-Base对应的训练配置文件

train_args只有13b的训练配置

数据处理模块问题，train_texts长度为1

您好，咨询以下数据处理模块的问题。我的pt数据路径下共有5个txt文件，在加载阶段也都是可以正常加载，如下所示： 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:13

LightingFx

AttributeError: 'NoneType' object has no attribute 'get'

torchrun --nproc_per_node=1 train.py --train_args_file train_args/Glm.yaml Traceback (most recent call last): File "/home/yierde/anaconda3/envs/tn/bin/torchrun", line 8, in sys.exit(main()) ^^^^^^ File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/elastic/multiprocessing/errors/__init__.py", line 346, in wrapper return f(*args, **kwargs) ^^^^^^^^^^^^^^^^^^ File "/home/yierde/anaconda3/envs/tn/lib/python3.11/site-packages/torch/distributed/run.py", line...

2662007798

torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

Error operation not supported at line 351 in file /home/tim/git/bitsandbytes/csrc/pythonInterface.c ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: 1) local_rank: 0 (pid: 31779) of binary: /root/miniconda3/envs/chatglm_ft/bin/python Traceback (most recent call last): File "/root/miniconda3/envs/chatglm_ft/bin/torchrun", line 8, in...

haonanye

RuntimeError: Placeholder storage has not been allocated on MPS device!

RuntimeError: Placeholder storage has not been allocated on MPS device! macos 执行single_chat.py出现错误 device = 'cpu' device = 'mps' 都一样

longkeyy

后续有没有考虑支持 flash-attention 的训练？

zejunwang1

微调模型保存问题

@yangjianxin1 ，您好，在调用torchrun --nproc_per_node={num_gpus} train.py --train_args_file train_args/llama2-13b-ext.yaml这个命令之后，全量模型微调运行结束，为什么没有保存微调后的模型在output文件夹下？output文件夹下只有一些训练参数文件，没有模型文件？

Alan-JW

请问这个Frefly的中文LLama2使用的是Firefly项目中 qlora 文件夹中的哪个配置json呢

RT 我想基于流萤这个中文LLama2微调一个对话模型，我只能执行qlora，但是我不知道我该使用Firefly中的 qlora 中的哪个配置文件，是 llama2-sft-qlora.json 这个文件吗？

zzisbeauty

baichaun2-13b增量预训练loss为0

1

作者你好，我使用baichuan2-13b做增量cpt时候loss一直是0. 我使用自己的数据集或是CNEWsum.jsonl都是0. ![image](https://github.com/yangjianxin1/Firefly-LLaMA2-Chinese/assets/58279305/986a79ea-eaf4-4f3f-82a6-0ce3c67d1a0b)

LiuChen19960902

关于训练细节

您好，请问在指令微调时，验证集是怎么构建的？大概多大？

weipeng008005

Firefly-LLaMA2-Chinese
Firefly-LLaMA2-Chinese copied to clipboard

Metadata

没有Firefly-LLaMA2-7B-Base对应的训练配置文件

数据处理模块问题，train_texts长度为1

AttributeError: 'NoneType' object has no attribute 'get'

torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

RuntimeError: Placeholder storage has not been allocated on MPS device!

后续有没有考虑支持 flash-attention 的训练？

微调模型保存问题

请问这个Frefly的中文LLama2使用的是Firefly项目中 qlora 文件夹中的哪个配置json呢

baichaun2-13b增量预训练loss为0

关于训练细节

← Metadata

Owner

Metadata

Firefly-LLaMA2-Chinese Firefly-LLaMA2-Chinese copied to clipboard

Metadata

← Metadata

Owner

Metadata

Firefly-LLaMA2-Chinese
Firefly-LLaMA2-Chinese copied to clipboard