llm-foundry
llm-foundry copied to clipboard
Support Remote and HF promptfiles in hf_generate script
Small QoL improvement for testing generation. Uses composer's get_file to support remote prompt files, and adds a bit of syntax for pointing to huggingface hub datasets as well.
-p file::/local/path was already supported. This adds -p file::s3://remote/path and -p dataset::mosaicml/some-hub-dataset. For HF datasets, it defaults to looking for a column named prompt but will use any string passed as the prompt_delimiter as the column name (kind of abuses the API, but it felt understandable)
@dakinggg if this is out of scope let me know and I can close this out. Most workloads like this end up going to inference... but I can see it being useful both internally and for customers.
EDIT: updated the syntax so remote files are just s3:// and HF datasets are hf://
example invocations:
python hf_generate.py -n mosaicml/mpt-7b -p hf://mosaicml/some-prompts --model_dtype bf16 --max_batch_size 8
and
python hf_generate.py -n mosaicml/mpt-7b -p s3://our-bucket/path/prompts.txt --prompt-delimiter $'\n' --max_batch_size 8