lm-evaluation-harness max_length not set correctly

max_length not set correctly

Open hatimbr opened this issue 1 year ago • 1 comments

Hi, It seems there is a problem with lm_eval when I am not setting 'max_length' with some tasks (at least GEM/wiki_lingua_en). When I am letting 'max_length' with default value, I get this error message:

╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│ /gpfsdswork/projects/rech/lmy/ssos022/lm-evaluation-harness/main.py:215 in   │
│ <module>                                                                     │
│                                                                              │
│   212                                                                        │
│   213                                                                        │
│   214 if __name__ == "__main__":                                             │
│ ❱ 215 │   main()                                                             │
│   216                                                                        │
│                                                                              │
│ /gpfsdswork/projects/rech/lmy/ssos022/lm-evaluation-harness/main.py:197 in   │
│ main                                                                         │
│                                                                              │
│   194 │   │                                                                  │
│   195 │   │   with OfflineEmissionsTracker(country_iso_code="FRA", log_level │
│   196 │   │   │   print()  # Add newline between emissions tracker and evalu │
│ ❱ 197 │   │   │   results = evaluator.cli_evaluate(**evaluate_args)          │
│   198 │                                                                      │
│   199 │   with open(f"./outputs/agg{path_separator}{output_path}.json", "w") │
│   200 │   │   json.dump({"results": results["results"], "config": results["c │
│                                                                              │
│ /gpfsdswork/projects/rech/lmy/ssos022/lm-evaluation-harness/lm_eval/evaluato │
│ r.py:90 in cli_evaluate                                                      │
│                                                                              │
│    87 │   │   cache_location = f"lm_cache/{model_api_name}_{cache_args}.db"  │
│    88 │   │   model = lm_eval.api.model.CachingLM(model, cache_location)     │
│    89 │                                                                      │
│ ❱  90 │   results = evaluate(                                                │
│    91 │   │   model=model,                                                   │
│    92 │   │   tasks=tasks,                                                   │
│    93 │   │   num_fewshot=num_fewshot,                                       │
│                                                                              │
│ /gpfsdswork/projects/rech/lmy/ssos022/lm-evaluation-harness/lm_eval/evaluato │
│ r.py:220 in evaluate                                                         │
│                                                                              │
│   217 │   │   # could also implement some kind of auto-grouping here; they s │
│   218 │   │   # end up next to each other.                                   │
│   219 │   │   logger.info(f"\n» Running all `{reqtype}` requests")           │
│ ❱ 220 │   │   resps = getattr(model, reqtype)([req.args for req in reqs])    │
│   221 │   │   resps = [                                                      │
│   222 │   │   │   x if req.index is None else x[req.index] for x, req in zip │
│   223 │   │   ]                                                              │
│                                                                              │
│ /gpfsdswork/projects/rech/lmy/ssos022/lm-evaluation-harness/lm_eval/models/h │
│ uggingface.py:343 in greedy_until                                            │
│                                                                              │
│   340 │   │   │                                                              │
│   341 │   │   │   token_context = self.tok_encode_batch(context)             │
│   342 │   │   │                                                              │
│ ❱ 343 │   │   │   responses = self._model_generate(                          │
│   344 │   │   │   │   inputs=token_context,                                  │
│   345 │   │   │   │   max_tokens=max_tokens,                                 │
│   346 │   │   │   │   stop=until,                                            │
│                                                                              │
│ /gpfsdswork/projects/rech/lmy/ssos022/lm-evaluation-harness/lm_eval/models/h │
│ uggingface.py:398 in _model_generate                                         │
│                                                                              │
│   395 │   ) -> TokenSequence:                                                │
│   396 │   │   # Ensure that the context does not encroach into the `space`   │
│   397 │   │   # for the generation.                                          │
│ ❱ 398 │   │   input_ids = inputs["input_ids"][:, self.max_gen_toks - self.ma │
│   399 │   │   attention_mask = inputs["attention_mask"][                     │
│   400 │   │   │   :, self.max_gen_toks - self.max_length :                   │
│   401 │   │   ]                                                              │
╰──────────────────────────────────────────────────────────────────────────────╯
RuntimeError: Expected !is_symbolic() to be true, but got false.  (Could this 
error message be improved?  If so, please report an enhancement request to 
PyTorch.)

I used this command:

python lm-evaluation-harness/main.py --model_api_name 'hf-causal' --model_args \
use_accelerate=True,pretrained='/path/to/bigscience/bloom-560m' \
--task_name GEM/wiki_lingua_en --template_names summarize_above_en

It seems that 'max_length' was set to a wrong value. It is working when I'm setting 'max_length' to 2048.

Feb 21 '23 15:02 hatimbr

lm-evaluation-harness lm-evaluation-harness copied to clipboard

max_length not set correctly

lm-evaluation-harness
lm-evaluation-harness copied to clipboard