axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

tiny-llama qlora example does not work ("please set from_tf=True")

Open lucyknada opened this issue 1 year ago • 4 comments

Please check that this issue hasn't been reported before.

  • [X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

training proceeds with the tiny-llama qlora example

Current behaviour

Only changing batch size to 5 and the dataset path, I get the following error:

[ERROR] [axolotl.load_model:544] [PID:469] [RANK:0] Unable to load weights from pytorch checkpoint file for '/root/.cache/huggingface/hub/models--TinyLlama--TinyLlama-1.1B-intermediate-step-1431k-3T/snapshots/4b8dd7e43ec08c24ccaf89cbf67898cff53c95ae/pytorch_model.bin' at '/root/.cache/huggingface/hub/models--TinyLlama--TinyLlama-1.1B-intermediate-step-1431k-3T/snapshots/4b8dd7e43ec08c24ccaf89cbf67898cff53c95ae/pytorch_model.bin'. If you tried to load a PyTorch model from a TF 2.0 checkpoint, please set from_tf=True.

I have tried multiple winglian/axolotl docker tags, dev-latest, main-latest and main-py3.10-cu121-2.1.1, and running it locally; all have this issue.

Steps to reproduce

  1. (most likely optional) change batch size to 5
  2. (most likely optional) set dataset to a completion formated json file path [{"text": ...}, "text": ...}]
  3. start training
  4. get error

Config yaml

https://github.com/OpenAccess-AI-Collective/axolotl/blob/44ba616da2e5007837361bd727d6ea1fe07b3a0e/examples/tiny-llama/qlora.yml

Possible solution

No response

Which Operating Systems are you using?

  • [X] Linux
  • [ ] macOS
  • [X] Windows

Python Version

3.11.2

axolotl branch-commit

44ba616

Acknowledgements

  • [X] My issue title is concise, descriptive, and in title casing.
  • [X] I have searched the existing issues to make sure this bug has not been reported yet.
  • [X] I am using the latest version of axolotl.
  • [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

lucyknada avatar Jan 11 '24 23:01 lucyknada

try with the base model as unsloth/tinyllama instead which has safetensors. I believe the original version is using tensorflow checkpoints that don't play nicely with HF.

winglian avatar Jan 13 '24 05:01 winglian

I tested it and a few notes:

  • unsloth/tinyllama works
  • should qlora.yml example point to unsloth/tinyllama instead?
  • https://github.com/hiyouga/LLaMA-Factory works with the non unsloth variant too, what are they doing different than axolotl and is it worth merging in? e.g. are they pre-converting to safetensors?

lucyknada avatar Jan 13 '24 17:01 lucyknada

In llama factory, are you loading the original variant without any unsloth optimizations?

winglian avatar Jan 13 '24 19:01 winglian

yes, just using this one as model input: https://huggingface.co/TinyLlama/TinyLlama-1.1B-intermediate-step-1431k-3T

lucyknada avatar Jan 13 '24 19:01 lucyknada