axolotl
axolotl copied to clipboard
pretrain doesn't work on json\jsonl
Please check that this issue hasn't been reported before.
- [X] I searched previous Bug Reports didn't find any similar reports.
Expected Behavior
To work the same as when loading the dataset from HF
Current behaviour
Asks for a custom .py script
Steps to reproduce
Load a local json file:
pretraining_dataset: /home/sicarius/somefile.jsonl type: pretrain
Config yaml
pretraining_dataset: /home/sicarius/somefile.jsonl
type: pretrain
Possible solution
Treat it similarly as a loading a dataset from the HF hub
Which Operating Systems are you using?
- [X] Linux
- [ ] macOS
- [ ] Windows
Python Version
3.10
axolotl branch-commit
latest release
Acknowledgements
- [X] My issue title is concise, descriptive, and in title casing.
- [X] I have searched the existing issues to make sure this bug has not been reported yet.
- [X] I am using the latest version of axolotl.
- [X] I have provided enough information for the maintainers to reproduce and diagnose the issue.