Firefly icon indicating copy to clipboard operation
Firefly copied to clipboard

训练参数'gradient_checkpointing'设置为false时导致的报错

Open LeonG7 opened this issue 1 year ago • 1 comments

训练参数

    "output_dir": "./output",
    "model_name_or_path": "Baichuan-13B-Base",
    "train_file": "data/voc_train.jsonl",
    "num_train_epochs": 500,
    "per_device_train_batch_size": 16,
    "gradient_accumulation_steps": 2,
    "learning_rate": 1e-4,
    "max_seq_length": 1200,
    "logging_steps": 300,
    "save_steps": 500,
    "save_total_limit": 1,
    "lr_scheduler_type": "constant_with_warmup",
    "warmup_steps": 3000,
    "lora_rank": 64,
    "lora_alpha": 16,
    "lora_dropout": 0.05,
    "gradient_checkpointing": false,
    "disable_tqdm": false,
    "optim": "paged_adamw_32bit",
    "seed": 42,
    "bf16": true,
    "report_to": "tensorboard",
    "dataloader_num_workers": 5,
    "save_strategy": "steps",
    "weight_decay": 0,
    "max_grad_norm": 0.3,
    "remove_unused_columns": false

训练百川模型时将gradient_checkpointing设置为false,会报这个错误

RuntimeError: element 0 of tensors does not require grad and does not have a grad_fn

查到的解决方案是需要把变量转为Variable格式,详情https://blog.csdn.net/weixin_41990278/article/details/90311313

把gradient_checkpointing改为true后正常运行

LeonG7 avatar Aug 07 '23 02:08 LeonG7

请问如何将变量转为Variable格式?

wushihu avatar Nov 26 '23 14:11 wushihu