torchchat
torchchat copied to clipboard
Fix eval for .pte
Issue Inputs aren't set up correctly for .pte files. The input tensors must be static and cannot be reshaped. Currently, running eval will result in this error:
python3 torchchat.py eval llama3 --pte-path llama3.pte --limit 3
...
Running loglikelihood_rolling requests
0%| | 0/3 [00:00<?, ?it/s][tensor_impl.cpp:93] Attempted to resize a static tensor to a new shape at dimension 1 old_size: 1 new_size: 1263
[method.cpp:829] Error setting input 0: 0x10
0%| | 0/3 [00:00<?, ?it/s]
Time to run eval: 4.57s.
Traceback (most recent call last):
File "/Users/puri/torchchat/torchchat.py", line 92, in <module>
eval_main(args)
File "/Users/puri/torchchat/eval.py", line 252, in main
result = eval(
^^^^^
File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/Users/puri/torchchat/eval.py", line 198, in eval
eval_results = evaluate(
^^^^^^^^^
File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/lm_eval/utils.py", line 288, in _wrapper
return fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^
File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/lm_eval/evaluator.py", line 373, in evaluate
resps = getattr(lm, reqtype)(cloned_reqs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/lm_eval/models/huggingface.py", line 840, in loglikelihood_rolling
string_nll = self._loglikelihood_tokens(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/lm_eval/models/huggingface.py", line 1033, in _loglikelihood_tokens
self._model_call(batched_inps, **call_kwargs), dim=-1
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/puri/torchchat/eval.py", line 146, in _model_call
logits = self._model_forward(x, input_pos)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/puri/torchchat/eval.py", line 240, in <lambda>
model_forward = lambda x, input_pos: model(x, input_pos) # noqa
^^^^^^^^^^^^^^^^^^^
File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/puri/torchchat/.venv/lib/python3.11/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
return forward_call(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/puri/torchchat/build/model_et.py", line 23, in forward
logits = self.model_.forward(forward_inputs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: method->set_inputs() for method 'forward' failed with error 0x12
This issue originates from setting input shapes incorrectly during prefill.
Testing Run eval on exported llama3.pte
python3 torchchat.py eval llama3 --pte-path llama3.pte --limit 3
Warning: compilation is not available with device MPS, ignoring option to engage compilation
NumExpr defaulting to 16 threads.
PyTorch version 2.5.0.dev20240716 available.
Warning: checkpoint path ignored because an exported DSO or PTE path specified
Using device=mps
Loading model...
Cannot load specified PTE to mps. Attempting to load model to CPU instead
Time to load model: 0.05 seconds
Loading custom ops library: /Users/puri/torchchat/.venv/lib/python3.11/site-packages/executorch/examples/models/llama2/custom_ops/libcustom_ops_aot_lib.dylib
[program.cpp:134] InternalConsistency verification requested but not available
-----------------------------------------------------------
Using device 'cpu'
[Task: wikitext] metric word_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
[Task: wikitext] metric word_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
[Task: wikitext] metric byte_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
[Task: wikitext] metric byte_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
[Task: wikitext] metric bits_per_byte is defined, but aggregation is not. using default aggregation=bits_per_byte
[Task: wikitext] metric bits_per_byte is defined, but higher_is_better is not. using default higher_is_better=False
Repo card metadata block was not found. Setting CardData to empty.
Repo card metadata block was not found. Setting CardData to empty.
Building contexts for wikitext on rank 0...
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [00:00<00:00, 1368.01it/s]
Running loglikelihood_rolling requests
0%| | 0/3 [00:00<?, ?it/s]torch.Size([1, 1263, 128256])
33%|████████████████████████████████████████████████████████████████ | 1/3 [01:45<03:31, 105.91s/it]torch.Size([1, 2048, 128256])
torch.Size([1, 2048, 128256])
torch.Size([1, 2048, 128256])
67%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████ | 2/3 [10:49<06:03, 363.60s/it]torch.Size([1, 2048, 128256])
torch.Size([1, 2048, 128256])
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 3/3 [16:53<00:00, 337.73s/it]
Time to run eval: 1017.84s.
Time in model.forward: 1012.67s, over 6 model evaluations
forward run time stats - Median: 180.97s Min: 105.85s Max: 181.98s
For model llama3.pte
wikitext:
word_perplexity,none: 14.2482
byte_perplexity,none: 1.6776
bits_per_byte,none: 0.7464
alias: wikitext