lm-evaluation-harness
lm-evaluation-harness copied to clipboard
IFEval fails when multiple gpus are used (for DDP)
While doing IFEval, the lib downloads NLTK tokenizers. This is an issue when multiple processes are used (for eg. in a DDP inference), because the download is done by each process. i think this leads to race conditions and causes the following issue:
from lm_eval.tasks.ifeval import instructions_util
File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/lm_eval/tasks/ifeval/instructions_util.py", line 47, in <module>
download_nltk_resources()
File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/lm_eval/tasks/ifeval/instructions_util.py", line 44, in download_nltk_resources
nltk.download("punkt_tab")
File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/nltk/downloader.py", line 774, in download
for msg in self.incr_download(info_or_id, download_dir, force):
File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/nltk/downloader.py", line 642, in incr_download
yield from self._download_package(info, download_dir, force)
File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/nltk/downloader.py", line 733, in _download_package
for msg in _unzip_iter(filepath, zipdir, verbose=False):
File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/site-packages/nltk/downloader.py", line 2250, in _unzip_iter
zf.extractall(root)
File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/zipfile.py", line 1642, in extractall
self._extract_member(zipinfo, path, pwd)
File "/opt/pyenv-root/versions/3.9.17/lib/python3.9/zipfile.py", line 1692, in _extract_member
os.mkdir(targetpath)
FileExistsError: [Errno 17] File exists: '/home/flyte/nltk_data/tokenizers/punkt_tab/russian'
I think,
- The NLTK tokenizer should not be downloaded when a module is just imported.
- The download should be guarded if multiple processes are used (eg. in Distributed Data Parallel setting)
I used the main to produce this issue: (commit: 8138fd52)
One workaround for this issue is to download the nltk resources in a desired safer manner before.
Hi! Thanks for the reporting the issue! The PR should handle this. Thought the simplest way is to check for the LOCAL RANK environment variable, but open to feedback if you have any alternative suggestions.
Thanks for the PR.
I gave suggestion in the PR. I am not sure about the need of downloading nltk tokenizers when the module is imported. If possible it should be refactored.
I also ran into the punkt_tab problem
[rank1]: File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/__main__.py", line 450, in <module>
[rank1]: cli_evaluate()
[rank1]: File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/__main__.py", line 369, in cli_evaluate
[rank1]: results = evaluator.simple_evaluate(
[rank1]: File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/utils.py", line 395, in _wrapper
[rank1]: return fn(*args, **kwargs)
[rank1]: File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/evaluator.py", line 277, in simple_evaluate
[rank1]: results = evaluate(
[rank1]: File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/utils.py", line 395, in _wrapper
[rank1]: return fn(*args, **kwargs)
[rank1]: File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/evaluator.py", line 478, in evaluate
[rank1]: metrics = task.process_results(
[rank1]: File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/api/task.py", line 1351, in process_results
[rank1]: return self.config.process_results(doc, results)
[rank1]: File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/tasks/leaderboard/ifeval/utils.py", line 120, in process_results
[rank1]: out_strict = test_instruction_following_strict(inp, response)
[rank1]: File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/tasks/leaderboard/ifeval/utils.py", line 43, in test_instruction_following_strict
[rank1]: if response.strip() and instruction.check_following(response):
[rank1]: File "/home/litan/leaderboard/lm-evaluation-harness/lm_eval/tasks/ifeval/instructions.py", line 1580, in check_following
[rank1]: words = instructions_util.nltk.word_tokenize(value)
[rank1]: File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 142, in word_tokenize
[rank1]: sentences = [text] if preserve_line else sent_tokenize(text, language)
[rank1]: File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 119, in sent_tokenize
[rank1]: tokenizer = _get_punkt_tokenizer(language)
[rank1]: File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/__init__.py", line 105, in _get_punkt_tokenizer
[rank1]: return PunktTokenizer(language)
[rank1]: File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1744, in __init__
[rank1]: self.load_lang(lang)
[rank1]: File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/tokenize/punkt.py", line 1749, in load_lang
[rank1]: lang_dir = find(f"tokenizers/punkt_tab/{lang}/")
[rank1]: File "/opt/conda/envs/harness/lib/python3.10/site-packages/nltk/data.py", line 579, in find
[rank1]: raise LookupError(resource_not_found)
[rank1]: LookupError:
[rank1]: **********************************************************************
[rank1]: Resource punkt_tab not found.
[rank1]: Please use the NLTK Downloader to obtain the resource:
[rank1]: >>> import nltk
[rank1]: >>> nltk.download('punkt_tab')
[rank1]:
[rank1]: For more information see: https://www.nltk.org/data.html
[rank1]: Attempted to load tokenizers/punkt_tab/english/
[rank1]: Searched in:
[rank1]: - '/home/litan/nltk_data'
[rank1]: - '/opt/conda/envs/harness/nltk_data'
[rank1]: - '/opt/conda/envs/harness/share/nltk_data'
[rank1]: - '/opt/conda/envs/harness/lib/nltk_data'
[rank1]: - '/usr/share/nltk_data'
[rank1]: - '/usr/local/share/nltk_data'
[rank1]: - '/usr/lib/nltk_data'
[rank1]: - '/usr/local/lib/nltk_data'
[rank1]: **********************************************************************
I have the data in my local folder, and I can even load the tokenizer locally
>>> from nltk import PunktTokenizer
>>> PunktTokenizer("english")
<nltk.tokenize.punkt.PunktTokenizer object at 0x7f6170003a60>
But I got the above error when I ran it with
accelerate launch -m lm_eval --model_args pretrained=<model>,dtype=bfloat16 --log_samples --output_path eval_results --tasks leaderboard --batch_size 4 --apply_chat_template --fewshot_as_multiturn
Is it related to a race condition?
can we get an update on this (either merge the existing PR for fixing this issue or create a new one if needed?) happy to work on it but this issue is blocking multi-GPU evals for me
can we get an update on this (either merge the existing PR for fixing this issue or create a new one if needed?) happy to work on it but this issue is blocking multi-GPU evals for me
#2267 should fix it. As a workaround you could run python -c "import nltk; nltk.download('punkt') in your local environment before running lm_eval, and this should handle the error for the time being.
For my case, it only failed for the time while downloading the dataset. It is fine after I rerun it for a couple of times. This can be a workaround.