llm-foundry icon indicating copy to clipboard operation
llm-foundry copied to clipboard

Enable streaming of local finetuning dataset

Open eldarkurtic opened this issue 1 year ago • 1 comments

Current path for streaming of finetuning datasets does not allow for streaming from local path (which works for text datasets out of the box and is also supported by StreamingFinetuningDataset class). Diff just updates the way local is considered when building finetuning dataloaders. It has been tested on MPT-7B finetuning from local streaming dolly_hhrlhf dataset.

eldarkurtic avatar Jan 15 '24 11:01 eldarkurtic

Hey @eldarkurtic , thanks for the change! Could you please run pre-commit run --all-files locally to apply the auto formatting?

dakinggg avatar Jan 17 '24 00:01 dakinggg