LLaMA-Factory icon indicating copy to clipboard operation
LLaMA-Factory copied to clipboard

12/24/2023 09:04:10 - INFO - llmtuner.tuner.core.loader - trainable params: 0 || all params: 7069016064 || trainable%: 0.0000 Killed

Open 1Jenifer opened this issue 6 months ago • 0 comments

Reminder

  • [X] I have read the README and searched the existing issues.

Reproduction

***** train metrics ***** epoch = 3.0 train_loss = 1.6394 train_runtime = 3:33:32.94 train_samples_per_second = 6.842 train_steps_per_second = 0.428 [INFO|trainer.py:2889] 2023-12-24 04:44:36,261 >> Saving model checkpoint to boolmz_translation_model [INFO|tokenization_utils_base.py:2432] 2023-12-24 04:44:36,455 >> tokenizer config file saved in boolmz_translation_model/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2023-12-24 04:44:36,455 >> Special tokens file saved in boolmz_translation_model/special_tokens_map.json Figure saved: boolmz_translation_model/training_loss.png 12/24/2023 04:44:36 - WARNING - llmtuner.extras.ploting - No metric eval_loss to plot.

python src/export_model.py
--model_name_or_path bigscience/bloomz-7b1
--template default
--finetuning_type lora
--checkpoint_dir boolmz_translation_model
--export_dir bloomz_wmt

12/24/2023 09:02:24 - INFO - llmtuner.tuner.core.adapter - Fine-tuning method: LoRA 12/24/2023 09:04:10 - INFO - llmtuner.tuner.core.adapter - Merged 1 model checkpoint(s). 12/24/2023 09:04:10 - INFO - llmtuner.tuner.core.adapter - Loaded fine-tuned model from checkpoint(s): boolmz_translation_model 12/24/2023 09:04:10 - INFO - llmtuner.tuner.core.loader - trainable params: 0 || all params: 7069016064 || trainable%: 0.0000 Killed

Expected behavior

前面指令微调的部分是正常的,但在保存模型的时候12/24/2023 09:04:10 - INFO - llmtuner.tuner.core.loader - trainable params: 0 || all params: 7069016064 || trainable%: 0.0000 Killed,出现了这个情况,将数据类型修改成bf16后还是不行

System Info

  • transformers version: 4.36.2
  • Platform: Linux-5.4.0-136-generic-x86_64-with-glibc2.17
  • Python version: 3.8.10
  • Huggingface_hub version: 0.20.1
  • Safetensors version: 0.4.1
  • Accelerate version: 0.25.0
  • Accelerate config: not found
  • PyTorch version (GPU?): 2.0.0+cu118 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using GPU in script?:
  • Using distributed or parallel set-up in script?:

Others

No response

1Jenifer avatar Dec 24 '23 06:12 1Jenifer