LLaDA icon indicating copy to clipboard operation
LLaDA copied to clipboard

Stucked when running evaluation.

Open zjnyly opened this issue 7 months ago • 5 comments

Hi, when I run the code for evaluation, the dill package reports recursive problem and never preceed. Is this a problem you have ever met? Thanks!

(py310) zjnyly@pc:~/LLaDA$ CUDA_VISIBLE_DEVICES=0 python eval_llada.py --tasks gsm8k --model llada_dist --model_args model_path='../LLaDA-8B-Instruct/',gen_length=1024,steps=1024,block_length=1024
INFO 05-27 15:10:03 __init__.py:194] No platform detected, vLLM is running on UnspecifiedPlatform
2025-05-27:15:10:04,000 INFO     [__main__.py:279] Verbosity set to INFO
2025-05-27:15:10:13,266 INFO     [__main__.py:376] Selected Tasks: ['gsm8k']
2025-05-27:15:10:13,268 INFO     [evaluator.py:164] Setting random seed to 0 | Setting numpy seed to 1234 | Setting torch manual seed to 1234 | Setting fewshot manual seed to 1234
2025-05-27:15:10:13,268 INFO     [evaluator.py:201] Initializing llada_dist model, with arguments: {'model_path': '../LLaDA-8B-Instruct/', 'gen_length': 1024, 'steps': 1024, 'block_length': 1024}
2025-05-27:15:10:13,420 WARNING  [other.py:349] Detected kernel version 5.4.0, which is below the recommended minimum of 5.5.0; this can cause the process to hang. It is recommended to upgrade the kernel to the minimum version or higher.
Loading checkpoint shards: 100%|█████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:01<00:00,  5.67it/s]
Using the latest cached version of the dataset since gsm8k couldn't be found on the Hugging Face Hub
2025-05-27:15:10:20,806 WARNING  [load.py:1444] Using the latest cached version of the dataset since gsm8k couldn't be found on the Hugging Face Hub
Found the latest cached dataset configuration 'main' at /home/zjnyly/.cache/huggingface/datasets/gsm8k/main/0.0.0/e53f048856ff4f594e959d75785d2c2d37b678ee (last modified on Tue May 27 14:45:53 2025).
2025-05-27:15:10:20,808 WARNING  [cache.py:94] Found the latest cached dataset configuration 'main' at /home/zjnyly/.cache/huggingface/datasets/gsm8k/main/0.0.0/e53f048856ff4f594e959d75785d2c2d37b678ee (last modified on Tue May 27 14:45:53 2025).
2025-05-27:15:10:21,037 INFO     [task.py:415] Building contexts for gsm8k on rank 0...
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████| 1319/1319 [00:06<00:00, 216.24it/s]
2025-05-27:15:10:27,172 INFO     [evaluator.py:496] Running generate_until requests
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot locate reference to <enum 'ActivationType'>.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot pickle <enum 'ActivationType'>: transformers_modules.configuration_llada.ActivationType has recursive self-references that trigger a RecursionError.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot locate reference to <class 'transformers_modules.modeling_llada.LayerNormBase'>.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot pickle <class 'transformers_modules.modeling_llada.LayerNormBase'>: transformers_modules.modeling_llada.LayerNormBase has recursive self-references that trigger a RecursionError.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot locate reference to <enum 'LayerNormType'>.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot pickle <enum 'LayerNormType'>: transformers_modules.configuration_llada.LayerNormType has recursive self-references that trigger a RecursionError.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot locate reference to <enum 'BlockType'>.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot pickle <enum 'BlockType'>: transformers_modules.configuration_llada.BlockType has recursive self-references that trigger a RecursionError.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot locate reference to <class 'transformers_modules.modeling_llada.LLaDABlock'>.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot pickle <class 'transformers_modules.modeling_llada.LLaDABlock'>: transformers_modules.modeling_llada.LLaDABlock has recursive self-references that trigger a RecursionError.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot locate reference to <enum 'InitFnType'>.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot pickle <enum 'InitFnType'>: transformers_modules.configuration_llada.InitFnType has recursive self-references that trigger a RecursionError.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot locate reference to <enum 'ModuleType'>.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot pickle <enum 'ModuleType'>: transformers_modules.modeling_llada.ModuleType has recursive self-references that trigger a RecursionError.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot locate reference to <enum 'ActivationCheckpointingStrategy'>.
  StockPickler.save(self, obj, save_persistent_id)
/home/zjnyly/.conda/envs/py310/lib/python3.10/site-packages/dill/_dill.py:414: PicklingWarning: Cannot pickle <enum 'ActivationCheckpointingStrategy'>: transformers_modules.configuration_llada.ActivationCheckpointingStrategy has recursive self-references that trigger a RecursionError.
  StockPickler.save(self, obj, save_persistent_id)
^CParameter 'function'=<function LLaDAEvalHarness.generate_until.<locals>._tokenize at 0x7f877d2c9c60> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.
2025-05-27:15:11:01,186 WARNING  [fingerprint.py:258] Parameter 'function'=<function LLaDAEvalHarness.generate_until.<locals>._tokenize at 0x7f877d2c9c60> of the transform datasets.arrow_dataset.Dataset._map_single couldn't be hashed properly, a random hash was used instead. Make sure your transforms and parameters are serializable with pickle or dill for the dataset fingerprinting and caching to work. If you reuse this transform, the caching mechanism will consider it to be different from the previous calls and recompute everything. This warning is only showed once. Subsequent hashing failures won't be showed.

zjnyly avatar May 27 '25 15:05 zjnyly

Thanks for your interest!

I'm so sorry; this error has never happened to me before. It seems to be a network error when downloading the test dataset, but I'm not entirely sure.

nieshenx avatar May 29 '25 16:05 nieshenx

@zjnyly hi! I've got the same problem, any luck solving it?

aysim avatar Jun 18 '25 16:06 aysim

@zjnyly hi! I've got the same problem, any luck solving it?

Hi, I think you can ignore these warnings. It seems it's because LLaDA is not performing auto-regressive generation, so the time to finish even one sequence is too long. Maybe you should wait for about 5 minutes?

zjnyly avatar Jun 20 '25 07:06 zjnyly

thanks, solved!

aysim avatar Jun 20 '25 13:06 aysim

thanks, solved!

Hi, I met the same issue, and waiting did not make it further process. Can you tell me how to fix this issue? Thank you.

Chenfeng1271 avatar Oct 02 '25 01:10 Chenfeng1271