unitxt icon indicating copy to clipboard operation
unitxt copied to clipboard

Add new Loader from Huggingface space

Open yoavkatz opened this issue 1 year ago • 0 comments

Some data is available in Huggingface spaces and not HF datasets.

We'd like a custom loader from Huggingface spaces.

class LoadFromHFSpace(Loader):
    user_name: str
    space_name:str
    data_files: Mapping[str, str]]

    _requirements_list: List[str] = ["hf_hub_download"]

    def process(self):
      ... Use hf_hub_download API to download the data_files and create a dataseg

As an example, download the file

https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/data/mt_bench/question.jsonl

and use it

in

prepare/cards/mt_bench/generation/english_single_turn.py

Later replace the use OfirArviv/mt_bench_pairwise_comparison_gpt4_judgments

in unitxt/prepare/cards/mt_bench/response_assessment/pairwise_comparison/single_turn_gpt4_judgement.py

with download of the files from

https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/data/mt_bench/reference_answer/gpt-4.jsonl

yoavkatz avatar May 21 '24 12:05 yoavkatz