axolotl icon indicating copy to clipboard operation
axolotl copied to clipboard

pretrain doesn't work on json\jsonl

Open SicariusSicariiStuff opened this issue 5 months ago • 0 comments

Please check that this issue hasn't been reported before.

  • [X] I searched previous Bug Reports didn't find any similar reports.

Expected Behavior

To work the same as when loading the dataset from HF

Current behaviour

Asks for a custom .py script

Steps to reproduce

Load a local json file:

pretraining_dataset: /home/sicarius/somefile.jsonl type: pretrain

Config yaml

pretraining_dataset: /home/sicarius/somefile.jsonl
    type: pretrain

Possible solution

Treat it similarly as a loading a dataset from the HF hub

Which Operating Systems are you using?

  • [X] Linux
  • [ ] macOS
  • [ ] Windows

Python Version

3.10

axolotl branch-commit

latest release

Acknowledgements

  • [X] My issue title is concise, descriptive, and in title casing.
  • [X] I have searched the existing issues to make sure this bug has not been reported yet.
  • [X] I am using the latest version of axolotl.
  • [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.

SicariusSicariiStuff avatar Sep 05 '24 01:09 SicariusSicariiStuff