unitxt
unitxt copied to clipboard
Add new Loader from Huggingface space
Some data is available in Huggingface spaces and not HF datasets.
We'd like a custom loader from Huggingface spaces.
class LoadFromHFSpace(Loader):
user_name: str
space_name:str
data_files: Mapping[str, str]]
_requirements_list: List[str] = ["hf_hub_download"]
def process(self):
... Use hf_hub_download API to download the data_files and create a dataseg
As an example, download the file
https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/data/mt_bench/question.jsonl
and use it
in
prepare/cards/mt_bench/generation/english_single_turn.py
Later replace the use OfirArviv/mt_bench_pairwise_comparison_gpt4_judgments
in unitxt/prepare/cards/mt_bench/response_assessment/pairwise_comparison/single_turn_gpt4_judgement.py
with download of the files from
https://github.com/lm-sys/FastChat/blob/main/fastchat/llm_judge/data/mt_bench/reference_answer/gpt-4.jsonl