LLM-Finetuning-Toolkit icon indicating copy to clipboard operation
LLM-Finetuning-Toolkit copied to clipboard

Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.

Results 27 LLM-Finetuning-Toolkit issues
Sort by recently updated
recently updated
newest added

**Is your feature request related to a problem? Please describe.** I'm working on a problem that requires me to split my data in a specific way (base on dates). Right...

enhancement
good first issue

**Describe the bug** I'm trying to run this toolkit on colab notebook with T4 GPU and ran into errors. In order to get it working, I needed to turn bf16...

Ensure all releases, style checks, and unit tests can be run via CI, blocking any PRs that fail CI. For Docker packages, use: https://github.com/orgs/georgian-io/packages For PyPI packages, use: https://pypi.org/

enhancement

It's much better to publish documentation on a dedicated static hosting solution. https://docs.github.com/en/pages/getting-started-with-github-pages or https://medium.com/swlh/publish-a-static-website-in-a-day-with-mkdocs-and-netlify-3cc076d0efaf

documentation

Ensure that we include a Makefile containing all the necessary development commands, such as how to run tests, perform releases, and execute style checks, among others. For a great example,...

If I have: - test_split: 0.1 - train_split: 0.8 Maybe we can get `calc_val_split=1-0.1-0.8=0.1` split as validation. Maybe also apply something like `max(calc_val_split, 0.05)` to prevent val split to be...

There is no good & easy-to-start end-to-end distributed training example on the web. Plus, there are so many ways of doing this: via raw [PyTorch](https://github.com/pytorch/examples/tree/main/distributed/ddp-tutorial-series), via [Ray Train](https://docs.ray.io/en/latest/train/train.html), via [TorchX](https://github.com/pytorch/torchx),...

enhancement

Reference: https://coverage.readthedocs.io/en/7.4.4/ https://pypi.org/project/pytest-cov/ https://github.com/marketplace/actions/code-coverage-summary https://github.com/marketplace/actions/code-coverage-report-difference

**Is your feature request related to a problem? Please describe.** - Dataset creation table display always display all columns of dataset, instead of ones needed by `prompt` and `prompt_stub` -...

enhancement
good first issue

**Describe the bug** At dataset creation, the dataset generated will always get the cached version despite change in file. **To Reproduce** 1. Run `toolkit.py` 2. Ctrl-C 3. Add a line...

bug
good first issue