torchchat
torchchat copied to clipboard
Eval script fails on CPU on model generated by ExecuTorch
🐛 Describe the bug
I am using ET and generating the quantized version of the model as shown in the README.
python torchchat.py export llama3.1 --quantize config/data/mobile.json --output-pte-path llama3.1.pte
Then we when I tried to evaluate the model using the python runtime on Desktop , it fails
python torchchat.py eval llama3.1 --pte-path llama3.1.pte --limit 5
NumExpr defaulting to 16 threads.
PyTorch version 2.5.0.dev20240716+cpu available.
Warning: checkpoint path ignored because an exported DSO or PTE path specified
Using device=cpu
Loading model...
Time to load model: 0.05 seconds
Loading custom ops library: /home/ubuntu/anaconda3/envs/torchchat/lib/python3.10/site-packages/executorch/examples/models/llama2/custom_ops/libcustom_ops_aot_lib.so
I 00:00:00.004209 executorch:program.cpp:133] InternalConsistency verification requested but not available
-----------------------------------------------------------
Using device 'cpu'
/home/ubuntu/anaconda3/envs/torchchat/lib/python3.10/site-packages/huggingface_hub/file_download.py:1150: FutureWarning: `resume_download` is deprecated and will be removed in version 1.0.0. Downloads always resume when possible. If you want to force a new download, use `force_download=True`.
warnings.warn(
[Task: wikitext] metric word_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
[Task: wikitext] metric word_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
[Task: wikitext] metric byte_perplexity is defined, but aggregation is not. using default aggregation=weighted_perplexity
[Task: wikitext] metric byte_perplexity is defined, but higher_is_better is not. using default higher_is_better=False
[Task: wikitext] metric bits_per_byte is defined, but aggregation is not. using default aggregation=bits_per_byte
[Task: wikitext] metric bits_per_byte is defined, but higher_is_better is not. using default higher_is_better=False
Downloading builder script: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 10.7k/10.7k [00:00<00:00, 46.8MB/s]
Downloading readme: 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 7.78k/7.78k [00:00<00:00, 39.3MB/s]
Repo card metadata block was not found. Setting CardData to empty.
Repo card metadata block was not found. Setting CardData to empty.
Downloading data: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4.72M/4.72M [00:00<00:00, 64.4MB/s]
Generating test split: 62 examples [00:00, 1903.53 examples/s]
Generating train split: 629 examples [00:00, 5131.04 examples/s]
Generating validation split: 60 examples [00:00, 7172.82 examples/s]
Building contexts for wikitext on rank 0...
100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 5/5 [00:00<00:00, 796.73it/s]
Running loglikelihood_rolling requests
0%| | 0/5 [00:00<?, ?it/s]E 00:00:31.679303 executorch:tensor_impl.cpp:86] Attempted to resize a static tensor to a new shape at dimension 1 old_size: 1 new_size: 1263
E 00:00:31.679320 executorch:method.cpp:824] Error setting input 0: 0x10
0%| | 0/5 [00:00<?, ?it/s]
Time to run eval: 6.75s.
Traceback (most recent call last):
File "/home/ubuntu/torchchat/torchchat.py", line 92, in <module>
eval_main(args)
File "/home/ubuntu/torchchat/eval.py", line 252, in main
result = eval(
File "/home/ubuntu/anaconda3/envs/torchchat/lib/python3.10/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
return func(*args, **kwargs)
File "/home/ubuntu/torchchat/eval.py", line 198, in eval
eval_results = evaluate(
File "/home/ubuntu/anaconda3/envs/torchchat/lib/python3.10/site-packages/lm_eval/utils.py", line 288, in _wrapper
return fn(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/torchchat/lib/python3.10/site-packages/lm_eval/evaluator.py", line 373, in evaluate
resps = getattr(lm, reqtype)(cloned_reqs)
File "/home/ubuntu/anaconda3/envs/torchchat/lib/python3.10/site-packages/lm_eval/models/huggingface.py", line 840, in loglikelihood_rolling
string_nll = self._loglikelihood_tokens(
File "/home/ubuntu/anaconda3/envs/torchchat/lib/python3.10/site-packages/lm_eval/models/huggingface.py", line 1033, in _loglikelihood_tokens
self._model_call(batched_inps, **call_kwargs), dim=-1
File "/home/ubuntu/torchchat/eval.py", line 146, in _model_call
logits = self._model_forward(x, input_pos)
File "/home/ubuntu/torchchat/eval.py", line 240, in <lambda>
model_forward = lambda x, input_pos: model(x, input_pos) # noqa
File "/home/ubuntu/anaconda3/envs/torchchat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1716, in _wrapped_call_impl
return self._call_impl(*args, **kwargs)
File "/home/ubuntu/anaconda3/envs/torchchat/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1727, in _call_impl
return forward_call(*args, **kwargs)
File "/home/ubuntu/torchchat/build/model_et.py", line 23, in forward
logits = self.model_.forward(forward_inputs)
RuntimeError: method->set_inputs() for method 'forward' failed with error 0x12
Versions
Collecting environment information...
PyTorch version: 2.5.0.dev20240716+cpu
Is debug build: False
CUDA used to build PyTorch: None
ROCM used to build PyTorch: N/A
OS: Ubuntu 22.04.4 LTS (x86_64)
GCC version: (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Clang version: Could not collect
CMake version: version 3.30.2
Libc version: glibc-2.35
Python version: 3.10.14 (main, May 6 2024, 19:42:50) [GCC 11.2.0] (64-bit runtime)
Python platform: Linux-6.5.0-1014-aws-x86_64-with-glibc2.35
Is CUDA available: False
CUDA runtime version: No CUDA
CUDA_MODULE_LOADING set to: N/A
GPU models and configuration: No CUDA
Nvidia driver version: No CUDA
cuDNN version: No CUDA
HIP runtime version: N/A
MIOpen runtime version: N/A
Is XNNPACK available: True
CPU:
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 16
On-line CPU(s) list: 0-15
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Platinum 8375C CPU @ 2.90GHz
CPU family: 6
Model: 106
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 1
Stepping: 6
BogoMIPS: 5799.93
Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand hypervisor lahf_lm abm 3dnowprefetch invpcid_single ssbd ibrs ibpb stibp ibrs_enhanced fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves wbnoinvd ida arat avx512vbmi pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpid md_clear flush_l1d arch_capabilities
Hypervisor vendor: KVM
Virtualization type: full
L1d cache: 384 KiB (8 instances)
L1i cache: 256 KiB (8 instances)
L2 cache: 10 MiB (8 instances)
L3 cache: 54 MiB (1 instance)
NUMA node(s): 1
NUMA node0 CPU(s): 0-15
Vulnerability Gather data sampling: Unknown: Dependent on hypervisor status
Vulnerability Itlb multihit: Not affected
Vulnerability L1tf: Not affected
Vulnerability Mds: Not affected
Vulnerability Meltdown: Not affected
Vulnerability Mmio stale data: Mitigation; Clear CPU buffers; SMT Host state unknown
Vulnerability Retbleed: Not affected
Vulnerability Spec rstack overflow: Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Vulnerability Spectre v2: Mitigation; Enhanced / Automatic IBRS, IBPB conditional, RSB filling, PBRSB-eIBRS SW sequence
Vulnerability Srbds: Not affected
Vulnerability Tsx async abort: Not affected
Versions of relevant libraries:
[pip3] executorch==0.4.0a0+c757499
[pip3] numpy==1.26.4
[pip3] torch==2.5.0.dev20240716+cpu
[pip3] torchao==0.3.1
[pip3] torchaudio==2.4.0.dev20240716+cpu
[pip3] torchsr==1.0.4
[pip3] torchvision==0.20.0.dev20240716+cpu
[conda] executorch 0.4.0a0+c757499 pypi_0 pypi
[conda] numpy 1.26.4 pypi_0 pypi
[conda] torch 2.5.0.dev20240716+cpu pypi_0 pypi
[conda] torchao 0.3.1 pypi_0 pypi
[conda] torchaudio 2.4.0.dev20240716+cpu pypi_0 pypi
[conda] torchsr 1.0.4 pypi_0 pypi
[conda] torchvision 0.20.0.dev20240716+cpu pypi_0 pypi
(torchchat) ubuntu@ip-172-31-7-68:~/torchchat$