Benjamin Ye

Results 15 comments of Benjamin Ye
trafficstars

This is not yet implemented in the CLI. I opened up a feature request to address this. https://github.com/georgian-io/LLM-Finetuning-Toolkit/issues/161 Meanwhile, here's a [notebook](https://github.com/georgian-io/LLM-Finetuning-Toolkit/files/15132595/inference_notebook.zip) you can use for inference on custom dataset....

Closed due to inactivity. Please let us know if you require further assistance. Please refer to issue https://github.com/georgian-io/LLM-Finetuning-Toolkit/issues/161 for any update for this feature request.

As a preamble for PyPI CI, we should add to the repo a PyPI Key. This can be done securely via [Github Secrets](https://docs.github.com/en/actions/security-guides/using-secrets-in-github-actions).

**Implementation Status** ✅ Style Checks ✅ Publish to PyPI on Release; PR https://github.com/georgian-io/LLM-Finetuning-Hub/pull/121 ✅ Publish to Container Registry; PR https://github.com/georgian-io/LLM-Finetuning-Hub/pull/122 🔴 Build and Publish Docs (blocked); See comment https://github.com/georgian-io/LLM-Finetuning-Hub/issues/111#issuecomment-2035456361

Test running done @ https://github.com/georgian-io/LLM-Finetuning-Toolkit/pull/184

Hi @angeliney, do you have a couple example rows of data?

This is caused by huggingface `Dataset.from_generator()` method checking to see if dataset is cached. [See code](https://github.com/huggingface/datasets/blob/ca8409a8bec4508255b9c3e808d0751eb1005260/src/datasets/arrow_dataset.py#L1007). Easiest solution is to pass in a `cache_dir` parameter (like `./dataset_cache`) with each `Ingestor`...

**Implementation Status** ✅ Build ✅ Publish to PyPI ✅ Run Tests ✅ Run Styling