exo
exo copied to clipboard
With exo unable to run llama-3.2-1b
It's trying load and never completed
Removing download task for Shard(model_id='llama-3.2-1b', start_layer=0, end_layer=15, n_layers=16): True
0%| | 0/148 [00:00<?, ?it/s]
ram used: 0.00 GB, layers.0.attention.wq.weight : 1%|▏ | 1/148 [00:00<00:07, 19.19it/s]
ram used: 0.01 GB, layers.0.attention.wk.weight : 1%|▎ | 2/148 [00:00<00:07, 19.45it/s]
ram used: 0.01 GB, layers.0.attention.wv.weight : 2%|▌ | 3/148 [00:00<00:05, 26.73it/s]
....
ram used: 2.47 GB, output.weight : 99%|████████████████████████▊| 147/148 [00:07<00:00, 18.38it/s]
ram used: 2.47 GB, freqs_cis : 100%|█████████████████████████| 148/148 [00:07<00:00, 18.50it/s]
ram used: 2.47 GB, freqs_cis : 100%|█████████████████████████| 148/148 [00:08<00:00, 18.50it/s]
loaded weights in 8005.22 ms, 2.47 GB loaded at 0.31 GB/s
0%| | 0/148 [00:00<?, ?it/s]
ram used: 2.47 GB, layers.0.attention.wq.weight : 1%|▏ | 1/148 [00:00<00:00, 224.25it/s]
ram used: 2.48 GB, layers.0.attention.wk.weight : 1%|▎ | 2/148 [00:00<00:00, 221.67it/s]
ram used: 2.48 GB, layers.0.attention.wo.weight : 3%|▋ | 4/148 [00:00<00:00, 262.63it/s]
...
ram used: 4.42 GB, tok_embeddings.weight : 99%|████████████████████████▋| 146/148 [00:07<00:00, 18.27it/s]
ram used: 4.94 GB, freqs_cis : 100%|█████████████████████████| 148/148 [00:07<00:00, 18.51it/s]
ram used: 4.94 GB, freqs_cis : 100%|█████████████████████████| 148/148 [00:07<00:00, 18.50it/s]
loaded weights in 8002.23 ms, 2.47 GB loaded at 0.31 GB/s
0%| | 0/148 [00:00<?, ?it/s]
ram used: 4.94 GB, layers.0.attention.wq.weight : 1%|▏ | 1/148 [00:00<00:00, 179.52it/s]
ram used: 4.95 GB, layers.0.attention.wk.weight : 1%|▎ | 2/148 [00:00<00:15, 9.20it/s]
...
ram used: 7.41 GB, freqs_cis : 100%|█████████████████████████| 148/148 [00:08<00:00, 18.31it/s]
ram used: 7.41 GB, freqs_cis : 100%|█████████████████████████| 148/148 [00:08<00:00, 18.31it/s]
loaded weights in 8087.63 ms, 2.47 GB loaded at 0.31 GB/s
0%| | 0/148 [00:00<?, ?it/s]
ram used: 7.41 GB, layers.0.attention.wq.weight : 1%|▏ | 1/148 [00:00<00:00, 202.87it/s]
ram used: 7.42 GB, layers.0.attention.wk.weight : 1%|▎ | 2/148 [00:00<00:00, 207.12it/s]
ram used: 7.43 GB, layers.0.attention.wo.weight
Final
ram used: 11.83 GB, tok_embeddings.weight : 99%|████████████████████████▋| 146/148 [00:08<00:00, 18.15it/s]
ram used: 12.36 GB, output.weight : 99%|████████████████████████▊| 147/148 [00:08<00:00, 18.27it/s]
ram used: 12.36 GB, freqs_cis : 100%|█████████████████████████| 148/148 [00:08<00:00, 18.39it/s]
ram used: 12.36 GB, freqs_cis : 100%|█████████████████████████| 148/148 [00:08<00:00, 18.38it/s]
loaded weights in 8055.68 ms, 2.47 GB loaded at 0.31 GB/s
Task exception was never retrieved
future: <Task finished name='Task-447' coro=<StandardNode.process_prompt() done, defined at
/home/user/exo/exo/orchestration/standard_node.py:144> exception=RuntimeError('Wait timeout: 10000 ms! (the signal is not set to 7830,
but 7828)')>
Traceback (most recent call last):
File "/home/user/exo/exo/orchestration/standard_node.py", line 166, in process_prompt
resp = await self._process_prompt(base_shard, prompt, request_id)
File "/home/user/exo/exo/orchestration/standard_node.py", line 198, in _process_prompt
result = await self.inference_engine.infer_prompt(request_id, shard, prompt)
File "/home/user/exo/exo/inference/inference_engine.py", line 29, in infer_prompt
output_data = await self.infer_tensor(request_id, shard, tokens)
File "/home/user/exo/exo/inference/tinygrad/inference.py", line 88, in infer_tensor
return output_data.numpy()
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 3500, in _wrapper
ret = fn(*args, **kwargs)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 310, in numpy
return np.frombuffer(self._data(), dtype=_to_np_dtype(self.dtype)).reshape(self.shape)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 3475, in _wrapper
if _METADATA.get() is not None: return fn(*args, **kwargs)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 254, in _data
cpu = self.cast(self.dtype.scalar()).contiguous().to("CLANG").realize()
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 3475, in _wrapper
if _METADATA.get() is not None: return fn(*args, **kwargs)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 213, in realize
run_schedule(*self.schedule_with_vars(*lst), do_update_stats=do_update_stats)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 224, in run_schedule
ei.run(var_vals, do_update_stats=do_update_stats)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 174, in run
et = self.prg(bufs, var_vals if var_vals is not None else {}, wait=wait or DEBUG >= 2)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 140, in __call__
self.copy(dest, src)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 135, in copy
dest.copyin(src.as_buffer(allow_zero_copy=True)) # may allocate a CPU buffer depending on allow_zero_copy
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 114, in as_buffer
return self.copyout(memoryview(bytearray(self.nbytes)))
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 125, in copyout
self.allocator.copyout(mv, self._buf)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 664, in copyout
self.device.timeline_signal.wait(self.device.timeline_value)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 424, in wait
raise RuntimeError(f"Wait timeout: {timeout} ms! (the signal is not set to {value}, but {self.value})")
RuntimeError: Wait timeout: 10000 ms! (the signal is not set to 7830, but 7828)
Task exception was never retrieved
future: <Task finished name='Task-30321' coro=<StandardNode.process_prompt() done, defined at
/home/user/exo/exo/orchestration/standard_node.py:144> exception=RuntimeError('Wait timeout: 10000 ms! (the signal is not set to 8753,
but 7828)')>
Traceback (most recent call last):
File "/home/user/exo/exo/orchestration/standard_node.py", line 166, in process_prompt
resp = await self._process_prompt(base_shard, prompt, request_id)
File "/home/user/exo/exo/orchestration/standard_node.py", line 198, in _process_prompt
result = await self.inference_engine.infer_prompt(request_id, shard, prompt)
File "/home/user/exo/exo/inference/inference_engine.py", line 29, in infer_prompt
output_data = await self.infer_tensor(request_id, shard, tokens)
File "/home/user/exo/exo/inference/tinygrad/inference.py", line 88, in infer_tensor
return output_data.numpy()
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 3500, in _wrapper
ret = fn(*args, **kwargs)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 310, in numpy
return np.frombuffer(self._data(), dtype=_to_np_dtype(self.dtype)).reshape(self.shape)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 3475, in _wrapper
if _METADATA.get() is not None: return fn(*args, **kwargs)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 254, in _data
cpu = self.cast(self.dtype.scalar()).contiguous().to("CLANG").realize()
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 3475, in _wrapper
if _METADATA.get() is not None: return fn(*args, **kwargs)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 213, in realize
run_schedule(*self.schedule_with_vars(*lst), do_update_stats=do_update_stats)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 224, in run_schedule
ei.run(var_vals, do_update_stats=do_update_stats)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 174, in run
et = self.prg(bufs, var_vals if var_vals is not None else {}, wait=wait or DEBUG >= 2)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 140, in __call__
self.copy(dest, src)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 135, in copy
dest.copyin(src.as_buffer(allow_zero_copy=True)) # may allocate a CPU buffer depending on allow_zero_copy
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 114, in as_buffer
return self.copyout(memoryview(bytearray(self.nbytes)))
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 125, in copyout
self.allocator.copyout(mv, self._buf)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 657, in copyout
self.device.synchronize()
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 519, in synchronize
self.timeline_signal.wait(self.timeline_value - 1)
File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 424, in wait
raise RuntimeError(f"Wait timeout: {timeout} ms! (the signal is not set to {value}, but {self.value})")
RuntimeError: Wait timeout: 10000 ms! (the signal is not set to 8753, but 7828)
Task exception was never retrieved
future: <Task finished name='Task-61811' coro=<StandardNode.process_prompt() done, defined at
/home/user/exo/exo/orchestration/standard_node.py:144> exception=RuntimeError('Wait timeout: 10000 ms! (the signal is not set to 8753,
but 7828)')>