lighteval [BUG] Unable to Download Dataset in Restricted Network Environment (No Access to Hugging Face or Mirrors)

Describe the bug

In restricted or offline network environments, it is currently not possible to run lighteval because it attempts to download datasets (e.g., from the Hugging Face Hub or mirrors) without fallback options. When access to sites like https://huggingface.co or https://hf-mirror.com is blocked, dataset loading fails completely.

To Reproduce

Run lighteval in an environment without internet access or with Hugging Face domains blocked.
Observe that dataset loading (e.g., via load_dataset() from the datasets library) fails.

Example error:

requests.exceptions.ConnectionError: HTTPSConnectionPool(host='huggingface.co', ...)

Expected behavior

Support for using pre-downloaded datasets from local disk (e.g., via environment variables or explicit local paths).
Graceful handling of offline environments or clear documentation for air-gapped usage.

Additional Context

Many research or production environments have strict firewall policies or no internet access.
This significantly limits the adoption of lighteval in such contexts.
A partial workaround is to manually preload datasets into the Hugging Face datasets cache, but this is not natively supported by lighteval.

I have implemented support for running lighteval in fully offline mode and plan to submit a PR soon. Hopefully, this will help address the issue and make the tool more accessible in restricted environments.

Jun 09 '25 13:06 xjli360

thanks for the issue and the fix ! I will do a round of review during the week :)

Jun 17 '25 10:06 NathanHB

I'm writing to add some support for this feature. However, it seems that the best way to implement it would be to build caching into the dataset-loading classes. I'm somewhat surprised this doesn't exist already.

Jul 15 '25 19:07 mjpost

hi, is there any progress on this? Very often to use this tool in the no-internet environments. Please help

often only give model path and dataset path, then eval the results

Nov 17 '25 09:11 Oukaishen

hey @Oukaishen ! is your dataset downloaded already ? If yes, you can point to it in your task config instead of having for example: openai/gsm8k you would have my/directory/gsm8k.

If you already downloaded the datset using datasets then it is caches, in that case, you don't have to change anything and it will use the caches dataset automatically.

Nov 17 '25 09:11 NathanHB

hey @Oukaishen ! is your dataset downloaded already ? If yes, you can point to it in your task config instead of having for example: openai/gsm8k you would have my/directory/gsm8k.

If you already downloaded the datset using datasets then it is caches, in that case, you don't have to change anything and it will use the caches dataset automatically.

Thanks for your quick reply, can you give the detail examples? I have tried two methods ,both cannot work.

First I tried the command line version

export GIT_PYTHON_REFRESH=quiet
lighteval accelerate \
     "model_name=/mnt/cephfs/LLM_MODEL_HUB/qwen/Qwen2.5-0.5B-Instruct/" \
     gsm8k \
    #  --custom-tasks /mnt/cephfs/kaishenou/lighteval/src/lighteval/tasks/tasks/gsm8k.py \
     --output-dir /mnt/cephfs/kaishenou/eval_benchmarks/lighteval/workdir/

Second I tried the python code version

MODEL_NAME = "/mnt/cephfs/LLM_MODEL_HUB/qwen/Qwen2.5-0.5B-Instruct/"
BENCHMARKS = "gsm8k"

evaluation_tracker = EvaluationTracker(output_dir="/mnt/cephfs/kaishenou/eval_benchmarks/lighteval/workdir/results")
pipeline_params = PipelineParameters(
    launcher_type=ParallelismManager.NONE,
    max_samples=2
)

model = AutoModelForCausalLM.from_pretrained(
  MODEL_NAME, device_map="auto"
)
config = TransformersModelConfig(model_name=MODEL_NAME, batch_size=1)
model = TransformersModel.from_model(model, config)

pipeline = Pipeline(
    model=model,
    pipeline_parameters=pipeline_params,
    evaluation_tracker=evaluation_tracker,
    tasks=BENCHMARKS,
)

results = pipeline.evaluate()
pipeline.show_results()
results = pipeline.get_results()

sorry that i not very sure for what you are pointing for "task cofig". I just follow this repo's readme and this guideline link

Will be very grateful if you can give the full examples like step1 change xxx flles. And I will also paste the full demo here( for other guys ) if i can go smoothly

Nov 18 '25 02:11 Oukaishen

Both way the errors are all related to the timeout hang(I gusess waiting for downloading the dataset file, which I have downloaded in specific directory)

Nov 18 '25 02:11 Oukaishen

Also I install the lighteval by 'pip install', so how do i change the files like , if i guess like

lighteval/src/lighteval/tasks/tasks/gsm8k.py

Nov 18 '25 02:11 Oukaishen

Here follows the error stack

Traceback (most recent call last):
  File "/usr/local/lib/python3.11/site-packages/lighteval/tasks/tasks/ifeval/instructions_utils.py", line 27, in download_nltk_resources
    nltk.data.find("tokenizers/punkt")
  File "/usr/local/lib/python3.11/site-packages/nltk/data.py", line 579, in find
    raise LookupError(resource_not_found)
LookupError: 
**********************************************************************
  Resource punkt not found.
  Please use the NLTK Downloader to obtain the resource:

  >>> import nltk
  >>> nltk.download('punkt')
  
  For more information see: https://www.nltk.org/data.html

  Attempted to load tokenizers/punkt

  Searched in:
    - '/root/nltk_data'
    - '/usr/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/share/nltk_data'
    - '/usr/local/share/nltk_data'
    - '/usr/lib/nltk_data'
    - '/usr/local/lib/nltk_data'
**********************************************************************


During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/cephfs/kaishenou/scripts/eval_benchmarks/debug_scripts/demo_lighteval.py", line 23, in <module>
    pipeline = Pipeline(
               ^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/lighteval/pipeline.py", line 142, in __init__
    self._init_tasks_and_requests(tasks=tasks)
  File "/usr/local/lib/python3.11/site-packages/lighteval/pipeline.py", line 214, in _init_tasks_and_requests
    self.registry = Registry(
                    ^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/lighteval/tasks/registry.py", line 152, in __init__
    self._task_registry = Registry.load_all_task_configs(
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/lighteval/tasks/registry.py", line 362, in load_all_task_configs
    loaded_configs.update(Registry._load_from_subdirs(task_subdirs))
                          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/lighteval/tasks/registry.py", line 335, in _load_from_subdirs
    module = importlib.import_module(f"lighteval.tasks.tasks.{module_name}.main")
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/importlib/__init__.py", line 126, in import_module
    return _bootstrap._gcd_import(name[level:], package, level)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<frozen importlib._bootstrap>", line 1204, in _gcd_import
  File "<frozen importlib._bootstrap>", line 1176, in _find_and_load
  File "<frozen importlib._bootstrap>", line 1147, in _find_and_load_unlocked
  File "<frozen importlib._bootstrap>", line 690, in _load_unlocked
  File "<frozen importlib._bootstrap_external>", line 940, in exec_module
  File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
  File "/usr/local/lib/python3.11/site-packages/lighteval/tasks/tasks/ifbench/main.py", line 35, in <module>
    from lighteval.tasks.tasks.ifbench import evaluation_lib
  File "/usr/local/lib/python3.11/site-packages/lighteval/tasks/tasks/ifbench/evaluation_lib.py", line 23, in <module>
    import lighteval.tasks.tasks.ifbench.instructions_registry as instructions_registry
  File "/usr/local/lib/python3.11/site-packages/lighteval/tasks/tasks/ifbench/instructions_registry.py", line 17, in <module>
    import lighteval.tasks.tasks.ifbench.instructions as instructions
  File "/usr/local/lib/python3.11/site-packages/lighteval/tasks/tasks/ifbench/instructions.py", line 40, in <module>
    import lighteval.tasks.tasks.ifeval.instructions_utils as instructions_util
  File "/usr/local/lib/python3.11/site-packages/lighteval/tasks/tasks/ifeval/instructions_utils.py", line 37, in <module>
    download_nltk_resources()
  File "/usr/local/lib/python3.11/site-packages/lighteval/tasks/tasks/ifeval/instructions_utils.py", line 29, in download_nltk_resources
    nltk.download("punkt")
  File "/usr/local/lib/python3.11/site-packages/nltk/downloader.py", line 774, in download
    for msg in self.incr_download(info_or_id, download_dir, force):
  File "/usr/local/lib/python3.11/site-packages/nltk/downloader.py", line 629, in incr_download
    info = self._info_or_id(info_or_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/nltk/downloader.py", line 603, in _info_or_id
    return self.info(info_or_id)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/nltk/downloader.py", line 1006, in info
    self._update_index()
  File "/usr/local/lib/python3.11/site-packages/nltk/downloader.py", line 949, in _update_index
    ElementTree.parse(urlopen(self._url)).getroot()
                      ^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 216, in urlopen
    return opener.open(url, data, timeout)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 519, in open
    response = self._open(req, data)
               ^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 536, in _open
    result = self._call_chain(self.handle_open, protocol, protocol +
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 496, in _call_chain
    result = func(*args)
             ^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 1391, in https_open
    return self.do_open(http.client.HTTPSConnection, req,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/urllib/request.py", line 1348, in do_open
    h.request(req.get_method(), req.selector, req.data, headers,
  File "/usr/lib64/python3.11/http/client.py", line 1286, in request
    self._send_request(method, url, body, headers, encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1332, in _send_request
    self.endheaders(body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1281, in endheaders
    self._send_output(message_body, encode_chunked=encode_chunked)
  File "/usr/lib64/python3.11/http/client.py", line 1041, in _send_output
    self.send(msg)
  File "/usr/lib64/python3.11/http/client.py", line 979, in send
    self.connect()
  File "/usr/lib64/python3.11/http/client.py", line 1451, in connect
    super().connect()
  File "/usr/lib64/python3.11/http/client.py", line 945, in connect
    self.sock = self._create_connection(
                ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.11/socket.py", line 848, in create_connection
    sock.connect(sa)
KeyboardInterrup

Nov 18 '25 02:11 Oukaishen

hi @NathanHB , can you give more detail help. Really thanks

Key piont : local(no internet download) model path + local dataset path（GSM8K） => Benchmark(evaluation) Results.

Nov 25 '25 02:11 Oukaishen

sorry for the wait ! It seems unrelated to the fact that you are trying to load a local dataset. The solution is:

lighteval/tasks/tasks/ifeval/instructions_utils.py

def download_nltk_resources():
    """Download 'punkt' if not already installed"""
    try:
        nltk.data.find("tokenizers/punkt")
    except LookupError:
        nltk.download("punkt")

    try:
        nltk.data.find("tokenizers/punkt_tab")
    except LookupError:
        nltk.download("punkt_tab")

Is trying to download ressources even though you have no internet access. You can open a PR adding a safeguard if it fails the download so that you are not impacted !

Dec 04 '25 15:12 NathanHB