llm-foundry icon indicating copy to clipboard operation
llm-foundry copied to clipboard

Support Remote and HF promptfiles in hf_generate script

Open samhavens opened this issue 1 year ago • 0 comments

Small QoL improvement for testing generation. Uses composer's get_file to support remote prompt files, and adds a bit of syntax for pointing to huggingface hub datasets as well.

-p file::/local/path was already supported. This adds -p file::s3://remote/path and -p dataset::mosaicml/some-hub-dataset. For HF datasets, it defaults to looking for a column named prompt but will use any string passed as the prompt_delimiter as the column name (kind of abuses the API, but it felt understandable)

@dakinggg if this is out of scope let me know and I can close this out. Most workloads like this end up going to inference... but I can see it being useful both internally and for customers.

EDIT: updated the syntax so remote files are just s3:// and HF datasets are hf://

example invocations:

python hf_generate.py -n mosaicml/mpt-7b -p hf://mosaicml/some-prompts --model_dtype bf16   --max_batch_size 8

and

python hf_generate.py -n mosaicml/mpt-7b -p s3://our-bucket/path/prompts.txt --prompt-delimiter $'\n' --max_batch_size 8

samhavens avatar Dec 07 '23 06:12 samhavens