lm-evaluation-harness
lm-evaluation-harness copied to clipboard
max_length not set correctly
Hi, It seems there is a problem with lm_eval when I am not setting 'max_length' with some tasks (at least GEM/wiki_lingua_en). When I am letting 'max_length' with default value, I get this error message:
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /gpfsdswork/projects/rech/lmy/ssos022/lm-evaluation-harness/main.py:215 in │
│ <module> │
│ │
│ 212 │
│ 213 │
│ 214 if __name__ == "__main__": │
│ ❱ 215 │ main() │
│ 216 │
│ │
│ /gpfsdswork/projects/rech/lmy/ssos022/lm-evaluation-harness/main.py:197 in │
│ main │
│ │
│ 194 │ │ │
│ 195 │ │ with OfflineEmissionsTracker(country_iso_code="FRA", log_level │
│ 196 │ │ │ print() # Add newline between emissions tracker and evalu │
│ ❱ 197 │ │ │ results = evaluator.cli_evaluate(**evaluate_args) │
│ 198 │ │
│ 199 │ with open(f"./outputs/agg{path_separator}{output_path}.json", "w") │
│ 200 │ │ json.dump({"results": results["results"], "config": results["c │
│ │
│ /gpfsdswork/projects/rech/lmy/ssos022/lm-evaluation-harness/lm_eval/evaluato │
│ r.py:90 in cli_evaluate │
│ │
│ 87 │ │ cache_location = f"lm_cache/{model_api_name}_{cache_args}.db" │
│ 88 │ │ model = lm_eval.api.model.CachingLM(model, cache_location) │
│ 89 │ │
│ ❱ 90 │ results = evaluate( │
│ 91 │ │ model=model, │
│ 92 │ │ tasks=tasks, │
│ 93 │ │ num_fewshot=num_fewshot, │
│ │
│ /gpfsdswork/projects/rech/lmy/ssos022/lm-evaluation-harness/lm_eval/evaluato │
│ r.py:220 in evaluate │
│ │
│ 217 │ │ # could also implement some kind of auto-grouping here; they s │
│ 218 │ │ # end up next to each other. │
│ 219 │ │ logger.info(f"\n» Running all `{reqtype}` requests") │
│ ❱ 220 │ │ resps = getattr(model, reqtype)([req.args for req in reqs]) │
│ 221 │ │ resps = [ │
│ 222 │ │ │ x if req.index is None else x[req.index] for x, req in zip │
│ 223 │ │ ] │
│ │
│ /gpfsdswork/projects/rech/lmy/ssos022/lm-evaluation-harness/lm_eval/models/h │
│ uggingface.py:343 in greedy_until │
│ │
│ 340 │ │ │ │
│ 341 │ │ │ token_context = self.tok_encode_batch(context) │
│ 342 │ │ │ │
│ ❱ 343 │ │ │ responses = self._model_generate( │
│ 344 │ │ │ │ inputs=token_context, │
│ 345 │ │ │ │ max_tokens=max_tokens, │
│ 346 │ │ │ │ stop=until, │
│ │
│ /gpfsdswork/projects/rech/lmy/ssos022/lm-evaluation-harness/lm_eval/models/h │
│ uggingface.py:398 in _model_generate │
│ │
│ 395 │ ) -> TokenSequence: │
│ 396 │ │ # Ensure that the context does not encroach into the `space` │
│ 397 │ │ # for the generation. │
│ ❱ 398 │ │ input_ids = inputs["input_ids"][:, self.max_gen_toks - self.ma │
│ 399 │ │ attention_mask = inputs["attention_mask"][ │
│ 400 │ │ │ :, self.max_gen_toks - self.max_length : │
│ 401 │ │ ] │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected !is_symbolic() to be true, but got false. (Could this
error message be improved? If so, please report an enhancement request to
PyTorch.)
I used this command:
python lm-evaluation-harness/main.py --model_api_name 'hf-causal' --model_args \
use_accelerate=True,pretrained='/path/to/bigscience/bloom-560m' \
--task_name GEM/wiki_lingua_en --template_names summarize_above_en
It seems that 'max_length' was set to a wrong value. It is working when I'm setting 'max_length' to 2048.