lighteval [BUG] Can not load `deutsche-telekom/Ger-RAG-eval` dataset.

Describe the bug

This is very similar or maybe the same as #211 . But I am writing a new issue because it is not related to Windows.

I am on Azure cloud on a 4 x A100 GPU Linux machine. I install lighteval with:

  git clone https://github.com/huggingface/lighteval.git
  cd lighteval
  # git main from Aug 21, 2024
  git checkout e6b599a1448a8b06141cb4f678866ae15b0c5863
  pip install -e .[accelerate]

Then I run:

 accelerate launch --multi_gpu --num_processes=2 -m \
    lighteval accelerate \
    --model_args "pretrained=$MODEL_NAME,model_parallel=True,trust_remote_code=True" \
    --use_chat_template \
    --override_batch_size 1 \
    --tasks "examples/tasks/all_german_rag_evals.txt" \
    --custom_tasks "community_tasks/german_rag_evals.py" \
    --output_dir="../outputs/evals"

This crashed with:

WARNING:lighteval.logging.hierarchical_logger:  } [0:00:16.752899]
WARNING:lighteval.logging.hierarchical_logger:  Tasks loading {
The repository for german_rag_evals contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/german_rag_evals.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] The repository for german_rag_evals contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/german_rag_evals.
You can avoid this prompt in future by passing the argument `trust_remote_code=True`.

Do you wish to run the custom code? [y/N] WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00.004633]
WARNING:lighteval.logging.hierarchical_logger:} [0:00:17.291495]
[rank1]: Traceback (most recent call last):
[rank1]:   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/datasets/load.py", line 121, in resolve_trust_remote_code
[rank1]:     answer = input(
[rank1]: EOFError: EOF when reading a line

[rank1]: During handling of the above exception, another exception occurred:

[rank1]: Traceback (most recent call last):
[rank1]:   File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank1]:     return _run_code(code, main_globals, None,
[rank1]:   File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
[rank1]:     exec(code, run_globals)
[rank1]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/__main__.py", line 93, in <module>
[rank1]:     cli_evaluate()
[rank1]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/__main__.py", line 58, in cli_evaluate
[rank1]:     main_accelerate(args)
[rank1]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/logging/hierarchical_logger.py", line 175, in wrapper
[rank1]:     return fn(*args, **kwargs)
[rank1]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/main_accelerate.py", line 78, in main
[rank1]:     pipeline = Pipeline(
[rank1]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/pipeline.py", line 126, in __init__
[rank1]:     self._init_tasks_and_requests(tasks=tasks)
[rank1]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/pipeline.py", line 175, in _init_tasks_and_requests
[rank1]:     _, tasks_groups_dict = get_custom_tasks(custom_tasks)
[rank1]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/tasks/registry.py", line 195, in get_custom_tasks
[rank1]:     custom_tasks_module = create_custom_tasks_module(custom_tasks=custom_tasks)
[rank1]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/tasks/registry.py", line 182, in create_custom_tasks_module
[rank1]:     dataset_module = dataset_module_factory(str(custom_tasks))
[rank1]:   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/datasets/load.py", line 1823, in dataset_module_factory
[rank1]:     ).get_module()
[rank1]:   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/datasets/load.py", line 968, in get_module
[rank1]:     trust_remote_code = resolve_trust_remote_code(self.trust_remote_code, self.name)
[rank1]:   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/datasets/load.py", line 134, in resolve_trust_remote_code
[rank1]:     raise ValueError(
[rank1]: ValueError: The repository for german_rag_evals contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/german_rag_evals.
[rank1]: Please pass the argument `trust_remote_code=True` to allow custom code to be run.
[rank0]: Traceback (most recent call last):
[rank0]:   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/datasets/load.py", line 121, in resolve_trust_remote_code
[rank0]:     answer = input(
[rank0]: EOFError: EOF when reading a line

[rank0]: During handling of the above exception, another exception occurred:

[rank0]: Traceback (most recent call last):
[rank0]:   File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 196, in _run_module_as_main
[rank0]:     return _run_code(code, main_globals, None,
[rank0]:   File "/root/miniconda3/envs/py3.10/lib/python3.10/runpy.py", line 86, in _run_code
[rank0]:     exec(code, run_globals)
[rank0]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/__main__.py", line 93, in <module>
[rank0]:     cli_evaluate()
[rank0]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/__main__.py", line 58, in cli_evaluate
[rank0]:     main_accelerate(args)
[rank0]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/logging/hierarchical_logger.py", line 175, in wrapper
[rank0]:     return fn(*args, **kwargs)
[rank0]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/main_accelerate.py", line 78, in main
[rank0]:     pipeline = Pipeline(
[rank0]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/pipeline.py", line 126, in __init__
[rank0]:     self._init_tasks_and_requests(tasks=tasks)
[rank0]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/pipeline.py", line 175, in _init_tasks_and_requests
[rank0]:     _, tasks_groups_dict = get_custom_tasks(custom_tasks)
[rank0]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/tasks/registry.py", line 195, in get_custom_tasks
[rank0]:     custom_tasks_module = create_custom_tasks_module(custom_tasks=custom_tasks)
[rank0]:   File "/mnt/azureml/cr/j/fce400824ed74df4afc6892d4f5f14b7/exe/wd/lighteval/src/lighteval/tasks/registry.py", line 182, in create_custom_tasks_module
[rank0]:     dataset_module = dataset_module_factory(str(custom_tasks))
[rank0]:   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/datasets/load.py", line 1823, in dataset_module_factory
[rank0]:     ).get_module()
[rank0]:   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/datasets/load.py", line 968, in get_module
[rank0]:     trust_remote_code = resolve_trust_remote_code(self.trust_remote_code, self.name)
[rank0]:   File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/datasets/load.py", line 134, in resolve_trust_remote_code
[rank0]:     raise ValueError(
[rank0]: ValueError: The repository for german_rag_evals contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/german_rag_evals.
[rank0]: Please pass the argument `trust_remote_code=True` to allow custom code to be run.
W0823 10:14:13.254000 23062888101696 torch/distributed/elastic/multiprocessing/api.py:858] Sending process 338 closing signal SIGTERM
E0823 10:14:13.368000 23062888101696 torch/distributed/elastic/multiprocessing/api.py:833] failed (exitcode: 1) local_rank: 1 (pid: 339) of binary: /root/miniconda3/envs/py3.10/bin/python3
Traceback (most recent call last):
  File "/root/miniconda3/envs/py3.10/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/accelerate_cli.py", line 48, in main
    args.func(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 1097, in launch_command
    multi_gpu_launcher(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/accelerate/commands/launch.py", line 734, in multi_gpu_launcher
    distrib_run.run(args)
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/distributed/run.py", line 892, in run
    elastic_launch(
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 133, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/root/miniconda3/envs/py3.10/lib/python3.10/site-packages/torch/distributed/launcher/api.py", line 264, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:

When I add this line before calling lighteval it works:

  export HF_DATASETS_TRUST_REMOTE_CODE=TRUE

IMHO the problem is located here: https://github.com/huggingface/lighteval/blob/e6b599a1448a8b06141cb4f678866ae15b0c5863/src/lighteval/tasks/registry.py#L182

This line should use a different function to load the dataset.

This is because loading the dataset with

from datasets import load_dataset

dataset = load_dataset("deutsche-telekom/Ger-RAG-eval", "task1")

works without problems. See here: https://colab.research.google.com/drive/1BUORL2_VxORGdIko6SMPqJqZIMUmtR-3?usp=sharing

To Reproduce

see above

Expected behavior

do not ask y/n question and do not crash

Version info

see above

Aug 23 '24 10:08 PhilipMay

Interesting, thanks for the report! I think it's because by default we don't allow trust_remote_code=True execution for dataset loading in lighteval, the parameter needs to be added to the task iirc. Why does the german rag dataset require trust remote code?

Sep 14 '24 08:09 clefourrier

Hi! I believe this should be fixed by the latest release, feel free to test and tell me!

Oct 24 '24 14:10 clefourrier