exo icon indicating copy to clipboard operation
exo copied to clipboard

With exo unable to run llama-3.2-1b

Open FFAMax opened this issue 3 months ago • 6 comments

It's trying load and never completed

Removing download task for Shard(model_id='llama-3.2-1b', start_layer=0, end_layer=15, n_layers=16): True
  0%|                                                                                                           | 0/148 [00:00<?, ?it/s]
ram used:  0.00 GB, layers.0.attention.wq.weight                      :   1%|▏                          | 1/148 [00:00<00:07, 19.19it/s]
ram used:  0.01 GB, layers.0.attention.wk.weight                      :   1%|▎                          | 2/148 [00:00<00:07, 19.45it/s]
ram used:  0.01 GB, layers.0.attention.wv.weight                      :   2%|▌                          | 3/148 [00:00<00:05, 26.73it/s]
....
ram used:  2.47 GB, output.weight                                     :  99%|████████████████████████▊| 147/148 [00:07<00:00, 18.38it/s]
ram used:  2.47 GB, freqs_cis                                         : 100%|█████████████████████████| 148/148 [00:07<00:00, 18.50it/s]
ram used:  2.47 GB, freqs_cis                                         : 100%|█████████████████████████| 148/148 [00:08<00:00, 18.50it/s]
loaded weights in 8005.22 ms, 2.47 GB loaded at 0.31 GB/s
  0%|                                                                                                           | 0/148 [00:00<?, ?it/s]
ram used:  2.47 GB, layers.0.attention.wq.weight                      :   1%|▏                         | 1/148 [00:00<00:00, 224.25it/s]
ram used:  2.48 GB, layers.0.attention.wk.weight                      :   1%|▎                         | 2/148 [00:00<00:00, 221.67it/s]
ram used:  2.48 GB, layers.0.attention.wo.weight                      :   3%|▋                         | 4/148 [00:00<00:00, 262.63it/s]
...
ram used:  4.42 GB, tok_embeddings.weight                             :  99%|████████████████████████▋| 146/148 [00:07<00:00, 18.27it/s]
ram used:  4.94 GB, freqs_cis                                         : 100%|█████████████████████████| 148/148 [00:07<00:00, 18.51it/s]
ram used:  4.94 GB, freqs_cis                                         : 100%|█████████████████████████| 148/148 [00:07<00:00, 18.50it/s]
loaded weights in 8002.23 ms, 2.47 GB loaded at 0.31 GB/s
  0%|                                                                                                           | 0/148 [00:00<?, ?it/s]
ram used:  4.94 GB, layers.0.attention.wq.weight                      :   1%|▏                         | 1/148 [00:00<00:00, 179.52it/s]
ram used:  4.95 GB, layers.0.attention.wk.weight                      :   1%|▎                          | 2/148 [00:00<00:15,  9.20it/s]
...
ram used:  7.41 GB, freqs_cis                                         : 100%|█████████████████████████| 148/148 [00:08<00:00, 18.31it/s]
ram used:  7.41 GB, freqs_cis                                         : 100%|█████████████████████████| 148/148 [00:08<00:00, 18.31it/s]
loaded weights in 8087.63 ms, 2.47 GB loaded at 0.31 GB/s
  0%|                                                                                                           | 0/148 [00:00<?, ?it/s]
ram used:  7.41 GB, layers.0.attention.wq.weight                      :   1%|▏                         | 1/148 [00:00<00:00, 202.87it/s]
ram used:  7.42 GB, layers.0.attention.wk.weight                      :   1%|▎                         | 2/148 [00:00<00:00, 207.12it/s]
ram used:  7.43 GB, layers.0.attention.wo.weight

Final

ram used: 11.83 GB, tok_embeddings.weight                             :  99%|████████████████████████▋| 146/148 [00:08<00:00, 18.15it/s]
ram used: 12.36 GB, output.weight                                     :  99%|████████████████████████▊| 147/148 [00:08<00:00, 18.27it/s]
ram used: 12.36 GB, freqs_cis                                         : 100%|█████████████████████████| 148/148 [00:08<00:00, 18.39it/s]
ram used: 12.36 GB, freqs_cis                                         : 100%|█████████████████████████| 148/148 [00:08<00:00, 18.38it/s]
loaded weights in 8055.68 ms, 2.47 GB loaded at 0.31 GB/s
Task exception was never retrieved
future: <Task finished name='Task-447' coro=<StandardNode.process_prompt() done, defined at
/home/user/exo/exo/orchestration/standard_node.py:144> exception=RuntimeError('Wait timeout: 10000 ms! (the signal is not set to 7830,
but 7828)')>
Traceback (most recent call last):
  File "/home/user/exo/exo/orchestration/standard_node.py", line 166, in process_prompt
    resp = await self._process_prompt(base_shard, prompt, request_id)
  File "/home/user/exo/exo/orchestration/standard_node.py", line 198, in _process_prompt
    result = await self.inference_engine.infer_prompt(request_id, shard, prompt)
  File "/home/user/exo/exo/inference/inference_engine.py", line 29, in infer_prompt
    output_data = await self.infer_tensor(request_id, shard, tokens)
  File "/home/user/exo/exo/inference/tinygrad/inference.py", line 88, in infer_tensor
    return output_data.numpy()
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 3500, in _wrapper
    ret = fn(*args, **kwargs)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 310, in numpy
    return np.frombuffer(self._data(), dtype=_to_np_dtype(self.dtype)).reshape(self.shape)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 3475, in _wrapper
    if _METADATA.get() is not None: return fn(*args, **kwargs)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 254, in _data
    cpu = self.cast(self.dtype.scalar()).contiguous().to("CLANG").realize()
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 3475, in _wrapper
    if _METADATA.get() is not None: return fn(*args, **kwargs)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 213, in realize
    run_schedule(*self.schedule_with_vars(*lst), do_update_stats=do_update_stats)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 224, in run_schedule
    ei.run(var_vals, do_update_stats=do_update_stats)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 174, in run
    et = self.prg(bufs, var_vals if var_vals is not None else {}, wait=wait or DEBUG >= 2)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 140, in __call__
    self.copy(dest, src)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 135, in copy
    dest.copyin(src.as_buffer(allow_zero_copy=True))  # may allocate a CPU buffer depending on allow_zero_copy
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 114, in as_buffer
    return self.copyout(memoryview(bytearray(self.nbytes)))
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 125, in copyout
    self.allocator.copyout(mv, self._buf)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 664, in copyout
    self.device.timeline_signal.wait(self.device.timeline_value)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 424, in wait
    raise RuntimeError(f"Wait timeout: {timeout} ms! (the signal is not set to {value}, but {self.value})")
RuntimeError: Wait timeout: 10000 ms! (the signal is not set to 7830, but 7828)
Task exception was never retrieved
future: <Task finished name='Task-30321' coro=<StandardNode.process_prompt() done, defined at
/home/user/exo/exo/orchestration/standard_node.py:144> exception=RuntimeError('Wait timeout: 10000 ms! (the signal is not set to 8753,
but 7828)')>
Traceback (most recent call last):
  File "/home/user/exo/exo/orchestration/standard_node.py", line 166, in process_prompt
    resp = await self._process_prompt(base_shard, prompt, request_id)
  File "/home/user/exo/exo/orchestration/standard_node.py", line 198, in _process_prompt
    result = await self.inference_engine.infer_prompt(request_id, shard, prompt)
  File "/home/user/exo/exo/inference/inference_engine.py", line 29, in infer_prompt
    output_data = await self.infer_tensor(request_id, shard, tokens)
  File "/home/user/exo/exo/inference/tinygrad/inference.py", line 88, in infer_tensor
    return output_data.numpy()
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 3500, in _wrapper
    ret = fn(*args, **kwargs)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 310, in numpy
    return np.frombuffer(self._data(), dtype=_to_np_dtype(self.dtype)).reshape(self.shape)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 3475, in _wrapper
    if _METADATA.get() is not None: return fn(*args, **kwargs)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 254, in _data
    cpu = self.cast(self.dtype.scalar()).contiguous().to("CLANG").realize()
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 3475, in _wrapper
    if _METADATA.get() is not None: return fn(*args, **kwargs)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/tensor.py", line 213, in realize
    run_schedule(*self.schedule_with_vars(*lst), do_update_stats=do_update_stats)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 224, in run_schedule
    ei.run(var_vals, do_update_stats=do_update_stats)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 174, in run
    et = self.prg(bufs, var_vals if var_vals is not None else {}, wait=wait or DEBUG >= 2)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 140, in __call__
    self.copy(dest, src)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/engine/realize.py", line 135, in copy
    dest.copyin(src.as_buffer(allow_zero_copy=True))  # may allocate a CPU buffer depending on allow_zero_copy
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 114, in as_buffer
    return self.copyout(memoryview(bytearray(self.nbytes)))
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 125, in copyout
    self.allocator.copyout(mv, self._buf)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 657, in copyout
    self.device.synchronize()
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 519, in synchronize
    self.timeline_signal.wait(self.timeline_value - 1)
  File "/home/user/exo/lib/python3.10/site-packages/tinygrad/device.py", line 424, in wait
    raise RuntimeError(f"Wait timeout: {timeout} ms! (the signal is not set to {value}, but {self.value})")
RuntimeError: Wait timeout: 10000 ms! (the signal is not set to 8753, but 7828)
Task exception was never retrieved
future: <Task finished name='Task-61811' coro=<StandardNode.process_prompt() done, defined at
/home/user/exo/exo/orchestration/standard_node.py:144> exception=RuntimeError('Wait timeout: 10000 ms! (the signal is not set to 8753,
but 7828)')>

FFAMax avatar Nov 14 '24 13:11 FFAMax