CPM-Bee 单卡微调，没有输出微调模型

单卡微调，没有输出微调模型

Open ivancr7 opened this issue 1 year ago • 6 comments

微调命令：torchrun --nnodes=1 --nproc_per_node=1 --rdzv_id=1 --rdzv_backend=c10d --rdzv_endpoint=localhost:12345 finetune_cpm_bee.py --use-delta --model-config config/cpm-bee-10b.json --dataset ../tutorials/basic_task_finetune/bin_data/train --eval_dataset ../tutorials/basic_task_finetune/bin_data/eval --epoch 5 --batch-size 5 --train-iters 100 --save-name cpm_bee_finetune --max-length 32 --save results/ --lr 0.0001 --inspect-iters 100 --warmup-iters 1 --eval-interval 50 --early-stop-patience 5 --lr-decay-style noam --weight-decay 0.01 --clip-grad 1.0 --loss-scale 32768 --start-step 0 --load model/pytorch_model.bin

相关日志： root ├── encoder (Encoder) │ ├── layers (TransformerBlockList) │ │ └── 0-47(CheckpointBlock) │ │ ├── self_att (SelfAttentionBlock) │ │ │ ├── layernorm_before_attention (LayerNorm) weight:[4096] │ │ │ └── self_attention (Attention) │ │ │ ├── project_q,project_v(Linear) weight:[16777216] │ │ │ │ └── lora (DistributedLowRankLinear) lora_A:[32768] lora_B:[32768] │ │ │ └── project_k,attention_out(Linear) weight:[16777216] │ │ └── ffn (FFNBlock) │ │ ├── layernorm_before_ffn (LayerNorm) weight:[4096] │ │ └── ffn (FeedForward) │ │ ├── w_in (DenseGatedACT) │ │ │ └── w_0,w_1(Linear) weight:[41943040] │ │ └── w_out (Linear) weight:[41943040] │ └── output_layernorm (LayerNorm) weight:[4096] ├── input_embedding (EmbeddingExt) weight:[354643968] └── position_bias (BucketPositionBias) relative_attention_bias:[16384] [INFO|(OpenDelta)basemodel:696]2023-06-08 16:07:50,152 >> Trainable Ratio: 6291456/9622372352=0.065384% [INFO|(OpenDelta)basemodel:698]2023-06-08 16:07:50,152 >> Delta Parameter Ratio: 6291456/9622372352=0.065384% [INFO|(OpenDelta)basemodel:700]2023-06-08 16:07:50,152 >> Static Memory 17.92 GB, Max Memory 36.48 GB

在result目录下没有看到有输出，日志也没有报错，请问下是怎么回事？