LLaMA-Factory
LLaMA-Factory copied to clipboard
12/24/2023 09:04:10 - INFO - llmtuner.tuner.core.loader - trainable params: 0 || all params: 7069016064 || trainable%: 0.0000 Killed
Reminder
- [X] I have read the README and searched the existing issues.
Reproduction
***** train metrics ***** epoch = 3.0 train_loss = 1.6394 train_runtime = 3:33:32.94 train_samples_per_second = 6.842 train_steps_per_second = 0.428 [INFO|trainer.py:2889] 2023-12-24 04:44:36,261 >> Saving model checkpoint to boolmz_translation_model [INFO|tokenization_utils_base.py:2432] 2023-12-24 04:44:36,455 >> tokenizer config file saved in boolmz_translation_model/tokenizer_config.json [INFO|tokenization_utils_base.py:2441] 2023-12-24 04:44:36,455 >> Special tokens file saved in boolmz_translation_model/special_tokens_map.json Figure saved: boolmz_translation_model/training_loss.png 12/24/2023 04:44:36 - WARNING - llmtuner.extras.ploting - No metric eval_loss to plot.
python src/export_model.py
--model_name_or_path bigscience/bloomz-7b1
--template default
--finetuning_type lora
--checkpoint_dir boolmz_translation_model
--export_dir bloomz_wmt
12/24/2023 09:02:24 - INFO - llmtuner.tuner.core.adapter - Fine-tuning method: LoRA 12/24/2023 09:04:10 - INFO - llmtuner.tuner.core.adapter - Merged 1 model checkpoint(s). 12/24/2023 09:04:10 - INFO - llmtuner.tuner.core.adapter - Loaded fine-tuned model from checkpoint(s): boolmz_translation_model 12/24/2023 09:04:10 - INFO - llmtuner.tuner.core.loader - trainable params: 0 || all params: 7069016064 || trainable%: 0.0000 Killed
Expected behavior
前面指令微调的部分是正常的,但在保存模型的时候12/24/2023 09:04:10 - INFO - llmtuner.tuner.core.loader - trainable params: 0 || all params: 7069016064 || trainable%: 0.0000 Killed,出现了这个情况,将数据类型修改成bf16后还是不行
System Info
-
transformers
version: 4.36.2 - Platform: Linux-5.4.0-136-generic-x86_64-with-glibc2.17
- Python version: 3.8.10
- Huggingface_hub version: 0.20.1
- Safetensors version: 0.4.1
- Accelerate version: 0.25.0
- Accelerate config: not found
- PyTorch version (GPU?): 2.0.0+cu118 (True)
- Tensorflow version (GPU?): not installed (NA)
- Flax version (CPU?/GPU?/TPU?): not installed (NA)
- Jax version: not installed
- JaxLib version: not installed
- Using GPU in script?:
- Using distributed or parallel set-up in script?:
Others
No response