torchchat icon indicating copy to clipboard operation
torchchat copied to clipboard

Fix eval for .pte

Open vmpuri opened this issue 6 months ago • 1 comments

Issue Inputs aren't set up correctly for .pte files. The input tensors must be static and cannot be reshaped. Currently, running eval will result in this error:

python3 torchchat.py eval llama3 --pte-path llama3.pte --limit 3    

...

Running loglikelihood_rolling requests
  0%|                                                                                                                                                                                                         | 0/3 [00:00<?, ?it/s][tensor_impl.cpp:93] Attempted to resize a static tensor to a new shape at dimension 1 old_size: 1 new_size: 1263
[method.cpp:829] Error setting input 0: 0x10
  0%|                                                                                                                                                                                                         | 0/3 [00:00<?, ?it/s]
Time to run eval: 4.57s.
Traceback (most recent call last):
  File "/Users/puri/torchchat/torchchat.py", line 92, in <module>
    eval_main(args)
  File "/Users/puri/torchchat/eval.py", line 252, in main
    result = eval(
             ^^^^^
  File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/eval.py", line 198, in eval
    eval_results = evaluate(
                   ^^^^^^^^^
  File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/lm_eval/utils.py", line 288, in _wrapper
    return fn(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/lm_eval/evaluator.py", line 373, in evaluate
    resps = getattr(lm, reqtype)(cloned_reqs)
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/lm_eval/models/huggingface.py", line 840, in loglikelihood_rolling
    string_nll = self._loglikelihood_tokens(
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/lm_eval/models/huggingface.py", line 1033, in _loglikelihood_tokens
    self._model_call(batched_inps, **call_kwargs), dim=-1
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/eval.py", line 146, in _model_call
    logits = self._model_forward(x, input_pos)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/eval.py", line 240, in <lambda>
    model_forward = lambda x, input_pos: model(x, input_pos)  # noqa
                                         ^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/puri/torchchat/build/model_et.py", line 23, in forward
    logits = self.model_.forward(forward_inputs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: method->set_inputs() for method 'forward' failed with error 0x12

This issue originates from setting input shapes incorrectly during prefill.

Testing Run eval on exported llama3.pte

python3 torchchat.py eval llama3 --pte-path llama3.pte --limit 3    
Warning: compilation is not available with device MPS, ignoring option to engage compilation
NumExpr defaulting to 16 threads.
PyTorch version 2.5.0.dev20240716 available.
Warning: checkpoint path ignored because an exported DSO or PTE path specified
Using device=mps
Loading model...
Cannot load specified PTE to mps. Attempting to load model to CPU instead
Time to load model: 0.05 seconds
Loading custom ops library: /Users/puri/torchchat/.venv/lib/python3.11/site-packages/executorch/examples/models/llama2/custom_ops/libcustom_ops_aot_lib.dylib
[program.cpp:134] InternalConsistency verification requested but not available
-----------------------------------------------------------
Using device 'cpu'
[Task: wikitext] metric word_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
[Task: wikitext] metric word_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
[Task: wikitext] metric byte_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
[Task: wikitext] metric byte_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
[Task: wikitext] metric bits_per_byte is defined, but aggregation is not. using default aggregation=bits_per_byte
[Task: wikitext] metric bits_per_byte is defined, but higher_is_better is not. using default higher_is_better=False
Repo card metadata block was not found. Setting CardData to empty.
Repo card metadata block was not found. Setting CardData to empty.
Building contexts for wikitext on rank 0...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1368.01it/s]
Running loglikelihood_rolling requests
  0%|                                                                                                                                                                                                         | 0/3 [00:00<?, ?it/s]torch.Size([1, 1263, 128256])
 33%|████████████████████████████████████████████████████████████████                                                                                                                                | 1/3 [01:45<03:31, 105.91s/it]torch.Size([1, 2048, 128256])
torch.Size([1, 2048, 128256])
torch.Size([1, 2048, 128256])
 67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████                                                                | 2/3 [10:49<06:03, 363.60s/it]torch.Size([1, 2048, 128256])
torch.Size([1, 2048, 128256])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [16:53<00:00, 337.73s/it]
Time to run eval: 1017.84s.
Time in model.forward: 1012.67s, over 6 model evaluations
forward run time stats - Median: 180.97s Min: 105.85s Max: 181.98s
For model llama3.pte
wikitext:
 word_perplexity,none: 14.2482
 byte_perplexity,none: 1.6776
 bits_per_byte,none: 0.7464
 alias: wikitext

vmpuri avatar Aug 22 '24 23:08 vmpuri