Hello, I don't know what I'm doing wrong. I received the following error as indicated in the title.

My input was as shown on this website: : Hugging Face - Ger-RAG-eval.

python run_evals_accelerate.py ^
  --model_args "pretrained=DiscoResearch/DiscoLM_German_7b_v1" ^
  --tasks "./examples/tasks/all_german_rag_evals.txt" ^
  --override_batch_size 1 ^
  --use_chat_template ^
  --custom_tasks "community_tasks/german_rag_evals.py" ^
  --output_dir "./evals/"

The output was as follows:

INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
WARNING:lighteval.logging.hierarchical_logger:main: (0, Namespace(model_config_path=None, model_args='pretrained=DiscoResearch/DiscoLM_German_7b_v1', max_samples=None, override_batch_size=1, job_id='', output_dir='./evals/', push_results_to_hub=False, save_details=False, push_details_to_hub=False, public_run=False, cache_dir=None, results_org=None, use_chat_template=True, system_prompt=None, dataset_loading_processes=1, custom_tasks='community_tasks/german_rag_evals.py', tasks='./examples/tasks/all_german_rag_evals.txt', num_fewshot_seeds=1)),  {
WARNING:lighteval.logging.hierarchical_logger:  Test all gather {
WARNING:lighteval.logging.hierarchical_logger:    Test gather tensor
WARNING:lighteval.logging.hierarchical_logger:    gathered_tensor tensor([0]), should be [0]
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00.010932]
WARNING:lighteval.logging.hierarchical_logger:  Creating model configuration {
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00]
WARNING:lighteval.logging.hierarchical_logger:  Model loading {
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING:lighteval.logging.hierarchical_logger:    Tokenizer truncation and padding size set to the left side.
WARNING:lighteval.logging.hierarchical_logger:    We are not in a distributed setting. Setting model_parallel to False.
WARNING:lighteval.logging.hierarchical_logger:    Model parallel was set to False, max memory set to None and device map to None
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:31<00:00, 10.60s/it]
WARNING:lighteval.logging.hierarchical_logger:    Using Data Parallelism, putting model on device cpu
WARNING:lighteval.logging.hierarchical_logger:    Model info: ModelInfo(model_name='DiscoResearch/DiscoLM_German_7b_v1', model_sha='560f972f9f735fc9289584b3aa8d75d0e539c44e', model_dtype='torch.bfloat16', model_size='13.49 GB')
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:33.371562]
WARNING:lighteval.logging.hierarchical_logger:  Tasks loading {
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:01.405496]
WARNING:lighteval.logging.hierarchical_logger:} [0:00:34.806011]
Traceback (most recent call last):
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 117, in resolve_trust_remote_code
    signal.signal(signal.SIGALRM, _raise_timeout_error)
AttributeError: module 'signal' has no attribute 'SIGALRM'. Did you mean: 'SIGABRT'?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval\run_evals_accelerate.py", line 82, in <module>
    main(args)
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\hierarchical_logger.py", line 166, in wrapper
    return fn(*args, **kwargs)
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\main_accelerate.py", line 83, in main
    task_dict = Registry(cache_dir=env_config.cache_dir).get_task_dict(
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\tasks\registry.py", line 135, in get_task_dict
    custom_tasks_module.append(create_custom_tasks_module(custom_tasks=custom_tasks))
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\tasks\registry.py", line 170, in create_custom_tasks_module
    dataset_module = dataset_module_factory(str(custom_tasks))
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 1814, in dataset_module_factory
    ).get_module()
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 962, in get_module
    trust_remote_code = resolve_trust_remote_code(self.trust_remote_code, self.name)
File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 133, in resolve_trust_remote_code
    raise ValueError(
ValueError: The repository for german_rag_evals contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/german_rag_evals.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.

I discovered that the argument trust_remote_code=True must be passed as part of the model_args parameter. To fix the issue, I tried the following code, but unfortunately, the error persisted.

python run_evals_accelerate.py ^
--model_args "pretrained=DiscoResearch/DiscoLM_German_7b_v1,trust_remote_code=True" ^
--tasks "./examples/tasks/all_german_rag_evals.txt" ^
--override_batch_size 1 ^
--use_chat_template ^
--custom_tasks "community_tasks/german_rag_evals.py" ^
--output_dir "./evals/"

Maybe this can help.

When I entered the command accelerate env, I received the following output:

Copy-and-paste the text below in your GitHub issue

Accelerate version: 0.31.0
Platform: Windows-10-10.0.19045-SP0
accelerate bash location: C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\Scripts\accelerate.exe
Python version: 3.10.14
Numpy version: 1.26.4
PyTorch version (GPU?): 2.3.1+cpu (False)
PyTorch XPU available: False
PyTorch NPU available: False
PyTorch MLU available: False
System RAM: 15.90 GB
Accelerate default config:
- compute_environment: LOCAL_MACHINE
- distributed_type: NO
- mixed_precision: no
- use_cpu: False
- debug: False
- num_processes: 1
- machine_rank: 0
- num_machines: 1
- gpu_ids: 0
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []

Jul 04 '24 12:07 Pommel4711

Hi! The ŧrust_remote_code=True message that you get is for the dataset loading, not the dataset. @PhilipMay, iirc you were the one who added this dataset, can you change it so it does not require trust_remote_code=True ?

Jul 04 '24 13:07 clefourrier

Yes I can do that @clefourrier . The problem is that I see no reason why the code thinks it needs to execute custom code to load the dataset. Everything is "just parquet"...

@Pommel4711 here is the command how I use the evaluation: https://huggingface.co/datasets/deutsche-telekom/Ger-RAG-eval#usage

It works (worked) without the ŧrust_remote_code for me.

Jul 04 '24 14:07 PhilipMay

Here is a Colab with code that shows that the dataset can be loaded without setting ŧrust_remote_code: https://colab.research.google.com/drive/1BUORL2_VxORGdIko6SMPqJqZIMUmtR-3?usp=sharing

Jul 04 '24 14:07 PhilipMay

Interesting, thanks a lot!

Jul 04 '24 14:07 clefourrier

@clefourrier and @Pommel4711 I think the root issue is this and not the dataset itself:

File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 117, in resolve_trust_remote_code
    signal.signal(signal.SIGALRM, _raise_timeout_error)
AttributeError: module 'signal' has no attribute 'SIGALRM'. Did you mean: 'SIGABRT'?

During handling of the above exception, another exception occurred:

Can you please check that?

Jul 04 '24 15:07 PhilipMay

@PhilipMay Hey, I'm running this on Windows. Do you use Linux, or do you know how I can fix this problem? I came across this Stack Overflow post that might be related: Python Standard Lib Signal AttributeError: module 'signal' has no attribute 'SIGALRM'.

For reference, I'm running on this commit: a98210fd3a2d1e8bface1c32b.

Thanks for your help!

Jul 04 '24 16:07 Pommel4711

Hm, I'm going to ping @lhoestq on this then because it seems like a datasets issue. Good job seeing this @PhilipMay !

Jul 05 '24 06:07 clefourrier

Hm, I'm going to ping @lhoestq on this then because it seems like a datasets issue.

Good idea. Thanks.

Jul 05 '24 07:07 PhilipMay

OSes that don't support SIGALRM are supported thanks to a try/except - not sure how you managed to get the error related to SIGALRM ? (see https://github.com/huggingface/datasets/blob/689447f8c86f777829a4db9ccc5d8133c12ec84c/src/datasets/load.py#L113-L134)

Anyway feel free to update datasets and try again just in case

Jul 08 '24 10:07 lhoestq

No problem for the transfer if needed

Jul 08 '24 11:07 clefourrier

I coppied the dataset code from this url ans now i get this error

(lighteval) D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval>python run_evals_accelerate.py ^  --model_args "pretrained=DiscoResearch/DiscoLM_German_7b_v1" ^  --tasks "./examples/tasks/all_german_rag_evals.txt" ^  --override_batch_size 1 ^  --use_chat_template ^  --custom_tasks "community_tasks/german_rag_evals.py" ^  --output_dir "./evals/"
Traceback (most recent call last):
  File "D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval\run_evals_accelerate.py", line 30, in <module>
    from lighteval.main_accelerate import CACHE_DIR, main
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\main_accelerate.py", line 31, in <module>
    from lighteval.evaluator import evaluate, make_results_table
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\evaluator.py", line 32, in <module>
    from lighteval.logging.evaluation_tracker import EvaluationTracker
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\evaluation_tracker.py", line 32, in <module>
    from datasets import Dataset, load_dataset
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\__init__.py", line 26, in <module>
    from .inspect import (
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\inspect.py", line 32, in <module>
    from .load import (
ImportError: cannot import name 'metric_module_factory' from 'datasets.load' (C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py)

Jul 11 '24 08:07 Pommel4711

Hi @Pommel4711 , Did you try to update datasets first as @lhoestq suggested?

Jul 11 '24 08:07 clefourrier

Hi @Pommel4711 , Did you try to update datasets first as @lhoestq suggested?

Yes, I did update datasets as @lhoestq suggested.

OSes that don't support SIGALRM are supported thanks to a try/except - not sure how you managed to get the error related to SIGALRM ? (see https://github.com/huggingface/datasets/blob/689447f8c86f777829a4db9ccc5d8133c12ec84c/src/datasets/load.py#L113-L134)

Anyway feel free to update datasets and try again just in case

Despite updating the records I get a new error. Any further suggestions would be greatly appreciated.

Thank you!

Jul 11 '24 08:07 Pommel4711

Just to be sure, how did you update the package, and what is the current version you are running?

Jul 11 '24 08:07 clefourrier

Issue with `lighteval` Evaluation Script

Description

I completely removed the Conda environment lighteval and updated the repository using the following command:

git pull
git checkout main

Checked out the main branch (commit ID = 4651531e4716911f99).

Then, I reinstalled the environment as follows:

conda create -n lighteval python=3.10 && conda activate lighteval
pip install .
pip install '.[accelerate,quantization,adapters]'

After that, I ran the evaluation script:

python run_evals_accelerate.py ^
  --model_args "pretrained=DiscoResearch/DiscoLM_German_7b_v1" ^
  --tasks "./examples/tasks/all_german_rag_evals.txt" ^
  --override_batch_size 1 ^
  --use_chat_template ^
  --custom_tasks "community_tasks/german_rag_evals.py" ^
  --output_dir "./evals/"

I encountered the following error:

File "D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval\run_evals_accelerate.py", line 30, in <module>
    from lighteval.main_accelerate import CACHE_DIR, main
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\main_accelerate.py", line 31, in <module>
    from lighteval.evaluator import evaluate, make_results_table
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\evaluator.py", line 32, in <module>
    from lighteval.logging.evaluation_tracker import EvaluationTracker
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\evaluation_tracker.py", line 37, in <module>
    from lighteval.logging.info_loggers import (
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\info_loggers.py", line 34, in <module>
    from lighteval.metrics import MetricCategory
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\metrics\__init__.py", line 25, in <module>
    from lighteval.metrics.metrics import MetricCategory, Metrics
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\metrics\metrics.py", line 75, in <module>
    class Metrics(Enum):
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\metrics\metrics.py", line 235, in Metrics
    sample_level_fn=JudgeLLM(
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\metrics\metrics_sample.py", line 634, in __init__
    self.judge = JudgeOpenAI(
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\metrics\llm_as_judge.py", line 80, in __init__
    with open(templates_path, "r") as f:
FileNotFoundError: [Errno 2] No such file or directory: 'C:\\apps\\entwicklungsumgebung\\anaconda3\\envs\\lighteval\\lib\\site-packages\\lighteval\\metrics\\judge_prompts.jsonl'

To resolve this, I downloaded judge_prompts.jsonl from this link and placed it in the directory where the error occurred.

I ran the script again, which resulted in the following output:

INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
INFO:absl:Using default tokenizer.
WARNING:bitsandbytes.cextension:The installed version of bitsandbytes was compiled without GPU support. 8-bit optimizers, 8-bit multiplication, and GPU quantization are unavailable.
WARNING:lighteval.logging.hierarchical_logger:main: (0, Namespace(model_config_path=None, model_args='pretrained=DiscoResearch/DiscoLM_German_7b_v1', max_samples=None, override_batch_size=1, job_id='', output_dir='./evals/', push_results_to_hub=False, save_details=False, push_details_to_hub=False, push_results_to_tensorboard=False, public_run=False, cache_dir=None, results_org=None, use_chat_template=True, system_prompt=None, dataset_loading_processes=1, custom_tasks='community_tasks/german_rag_evals.py', tasks='./examples/tasks/all_german_rag_evals.txt', num_fewshot_seeds=1)),  {
WARNING:lighteval.logging.hierarchical_logger:  Test all gather {
WARNING:lighteval.logging.hierarchical_logger:    Test gather tensor
WARNING:lighteval.logging.hierarchical_logger:    gathered_tensor tensor([0]), should be [0]
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00.006101]
WARNING:lighteval.logging.hierarchical_logger:  Creating model configuration {
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00]
WARNING:lighteval.logging.hierarchical_logger:  Model loading {
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
WARNING:lighteval.logging.hierarchical_logger:    Tokenizer truncation and padding size set to the left side.
WARNING:lighteval.logging.hierarchical_logger:    We are not in a distributed setting. Setting model_parallel to False.
WARNING:lighteval.logging.hierarchical_logger:    Model parallel was set to False, max memory set to None and device map to None
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [01:01<00:00, 20.60s/it]
WARNING:lighteval.logging.hierarchical_logger:    Using Data Parallelism, putting model on device cpu
WARNING:lighteval.logging.hierarchical_logger:    Model info: ModelInfo(model_name='DiscoResearch/DiscoLM_German_7b_v1', model_sha='560f972f9f735fc9289584b3aa8d75d0e539c44e', model_dtype='torch.bfloat16', model_size='13.49 GB')
WARNING:lighteval.logging.hierarchical_logger:  } [0:01:04.212504]
WARNING:lighteval.logging.hierarchical_logger:  Tasks loading {
WARNING:lighteval.logging.hierarchical_logger:  } [0:00:00.061989]
WARNING:lighteval.logging.hierarchical_logger:} [0:01:04.289455]
Traceback (most recent call last):
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 117, in resolve_trust_remote_code
    signal.signal(signal.SIGALRM, _raise_timeout_error)
AttributeError: module 'signal' has no attribute 'SIGALRM'. Did you mean: 'SIGABRT'?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval\run_evals_accelerate.py", line 89, in <module>
    main(args)
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\hierarchical_logger.py", line 166, in wrapper
    return fn(*args, **kwargs)
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\main_accelerate.py", line 91, in main
    task_dict = Registry(cache_dir=env_config.cache_dir).get_task_dict(
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\tasks\registry.py", line 133, in get_task_dict
    custom_tasks_module.append(create_custom_tasks_module(custom_tasks=custom_tasks))
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\tasks\registry.py", line 168, in create_custom_tasks_module
    dataset_module = dataset_module_factory(str(custom_tasks))
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 1814, in dataset_module_factory
    ).get_module()
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 962, in get_module
    trust_remote_code = resolve_trust_remote_code(self.trust_remote_code, self.name)
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 133, in resolve_trust_remote_code
    raise ValueError(
ValueError: The repository for german_rag_evals contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/german_rag_evals.
Please pass the argument trust_remote_code=True to allow custom code to be run.

I deleted the dataset and replaced it with this version.

Upon running the script again, I encountered this error:

File "D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval\run_evals_accelerate.py", line 30, in <module>
    from lighteval.main_accelerate import CACHE_DIR, main
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\main_accelerate.py", line 31, in <module>
    from lighteval.evaluator import evaluate, make_results_table
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\evaluator.py", line 32, in <module>
    from lighteval.logging.evaluation_tracker import EvaluationTracker
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\evaluation_tracker.py", line 32, in <module>
    from datasets import Dataset, load_dataset
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\__init__.py", line 26, in <module>
    from .inspect import (
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\inspect.py", line 32, in <module>
    from .load import (
ImportError: cannot import name 'metric_module_factory' from 'datasets.load' (C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py)

This clearly outlines the steps you took, the errors you encountered, and the troubleshooting steps i followed.

Jul 11 '24 08:07 Pommel4711

Thanks a lot for the detailed steps! I think you should instead just do pip install -U datasets to upgrade datasets instead of manually editing files.

Jul 11 '24 08:07 clefourrier

I tried running pip install -U datasets to upgrade datasets as you suggested, instead of manually editing the files. Unfortunately, this error still persists.

ImportError: cannot import name 'metric_module_factory' from 'datasets.load' (C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py)

Do you have any other suggestions on how to resolve this issue?

Thank you!

Jul 11 '24 09:07 Pommel4711

cc @lhoestq this sounds like a datasets issue, you can transfer the issue to your lib if needed :)

Jul 11 '24 10:07 clefourrier

I was unable to reproduce the issue even following the steps. I think it is indeed a datasets issue. I am however going to fix the missing file issue :)

Jul 11 '24 10:07 NathanHB

Maybe i found the problem with the dataset.

I followed the steps mentioned in this comment to resolve the issue without deleting the file C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py and replacing it with the version from this link.

Instead, I tried upgrading the datasets library using the following command:

pip install -U datasets

However, after the upgrade, I noticed that the load.py file remains unchanged and is not the same as the one from this link.

grafik

But than i remain with this error

(lighteval) D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval>python run_evals_accelerate.py ^  --model_args "pretrained=DiscoResearch/DiscoLM_German_7b_v1" ^  --tasks "./examples/tasks/all_german_rag_evals.txt" ^  --override_batch_size 1 ^  --use_chat_template ^  --custom_tasks "community_tasks/german_rag_evals.py" ^  --output_dir "./evals/"
Using either accelerate or text-generation to run this script is advised.
main: (0, Namespace(model_config_path=None, model_args='pretrained=DiscoResearch/DiscoLM_German_7b_v1', max_samples=None, override_batch_size=1, job_id='', output_dir='./evals/', push_results_to_hub=False, save_details=False, push_details_to_hub=False, push_results_to_tensorboard=False, public_run=False, cache_dir=None, results_org=None, use_chat_template=True, system_prompt=None, dataset_loading_processes=1, custom_tasks='community_tasks/german_rag_evals.py', tasks='./examples/tasks/all_german_rag_evals.txt', num_fewshot_seeds=1)),  {
  Test all gather {
    Not running in a parallel setup, nothing to test
  } [0:00:00.001000]
  Creating model configuration {
  } [0:00:00]
  Model loading {
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained.
    Tokenizer truncation and padding size set to the left side.
Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:21<00:00,  7.09s/it]
    Using Data Parallelism, putting model on device cpu
    Model info: ModelInfo(model_name='DiscoResearch/DiscoLM_German_7b_v1', model_sha='560f972f9f735fc9289584b3aa8d75d0e539c44e', model_dtype='torch.bfloat16', model_size=-1)
  } [0:00:23.565683]
  Tasks loading {
  } [0:00:00.061002]
} [0:00:23.641685]
Traceback (most recent call last):
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 117, in resolve_trust_remote_code
    signal.signal(signal.SIGALRM, _raise_timeout_error)
AttributeError: module 'signal' has no attribute 'SIGALRM'. Did you mean: 'SIGABRT'?

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "D:\Arbeit\AIUI\RAG Telecom Dataset\lighteval\run_evals_accelerate.py", line 89, in <module>
    main(args)
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\logging\hierarchical_logger.py", line 166, in wrapper
    return fn(*args, **kwargs)
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\main_accelerate.py", line 91, in main
    task_dict = Registry(cache_dir=env_config.cache_dir).get_task_dict(
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\tasks\registry.py", line 133, in get_task_dict
    custom_tasks_module.append(create_custom_tasks_module(custom_tasks=custom_tasks))
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\lighteval\tasks\registry.py", line 168, in create_custom_tasks_module
    dataset_module = dataset_module_factory(str(custom_tasks))
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 1814, in dataset_module_factory
    ).get_module()
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 962, in get_module
    trust_remote_code = resolve_trust_remote_code(self.trust_remote_code, self.name)
  File "C:\apps\entwicklungsumgebung\anaconda3\envs\lighteval\lib\site-packages\datasets\load.py", line 133, in resolve_trust_remote_code
    raise ValueError(
ValueError: The repository for german_rag_evals contains custom code which must be executed to correctly load the dataset. You can inspect the repository content at https://hf.co/datasets/german_rag_evals.
Please pass the argument `trust_remote_code=True` to allow custom code to be run.

Jul 11 '24 11:07 Pommel4711

Are you trying to run evaluation in offline mode? I got the same error but I am trying offline and I have replace HF links with local location but same trust_remote_code error keeps arising.

Aug 19 '24 10:08 nouf01

Are you trying to run evaluation in offline mode? I got the same error but I am trying offline and I have replace HF links with local location but same trust_remote_code error keeps arising.

I'm running this always with internet connection. But i don't know the problem. I switched to Linux and it worked

Aug 21 '24 11:08 Pommel4711

@Pommel4711 now I also have the same issue. I am on linux. So this should not be the root cause of the problem.

Aug 23 '24 10:08 PhilipMay

@Pommel4711 I found a solution that works for me. See here: #278

It is by adding export HF_DATASETS_TRUST_REMOTE_CODE=TRUE

But this should not be required. IMO this should be considered as a bug in lighteval.

Aug 23 '24 10:08 PhilipMay

can you try uninstalling and reinstalling datasets?

Aug 23 '24 12:08 lhoestq

can you try uninstalling and reinstalling datasets?

You mean a pip install -U datasets might not be enough? @lhoestq

Aug 23 '24 13:08 PhilipMay

I double checked and actually the 'SIGALRM' error is not important (just showing for windows users in addition to the trust_remote_code) error which is the actual error.

Anyway there seems to be a dataset called german_rag_evals is a dataset based on a python script that requires remote code to be executed. It is required to pass trust_remote_code=True (or via the environment variable) to access it.

I couldn't find this dataset on HF though, is it a local dataset of yours ?

Aug 23 '24 17:08 lhoestq

Ah it's community_tasks/german_rag_evals.py apparently ? Well maybe you should point to a dataset on HF with data e.g. in parquet files instead. (and remove this script from lighteval ?)

Aug 23 '24 17:08 lhoestq

Ah it's community_tasks/german_rag_evals.py apparently ? Well maybe you should point to a dataset on HF with data e.g. in parquet files instead. (and remove this script from lighteval ?)

I think this is not how lighteval is supposed to work. What do you think @clefourrier ? What I did is written here: #278

Aug 23 '24 17:08 PhilipMay

german_rag_evals.py is not a dataset script actually, datasets can't read it.

So it looks like lighteval uses datasets' dataset_module_factory() function to open this file, maybe lighteval should have its own function to do that

Aug 23 '24 17:08 lhoestq

Dataset loading issue for german_rag_evals on Windows

Issue with lighteval Evaluation Script

Description

Maybe i found the problem with the dataset.

Issue with `lighteval` Evaluation Script