exo icon indicating copy to clipboard operation
exo copied to clipboard

tinygrad inference engine fails with BEAM=1 due to not running on main thread

Open AlexCheema opened this issue 1 year ago • 1 comments

This only happens with BEAM=1. BEAM=0, BEAM=2, BEAM=3 all work fine This happens because exo runs tinygrad inference on another thread. Example command to reproduce: DEBUG=6 BEAM=1 python3 main.py --inference-engine tinygrad --run-model llama-3.1-8b

Error:

Error processing prompt: signal only works in main thread of the main interpreter
Traceback (most recent call last):
  File "/Users/alex/exo/main.py", line 158, in run_model_cli
    await node.process_prompt(shard, prompt, None, request_id=request_id)
  File "/Users/alex/exo/exo/orchestration/standard_node.py", line 98, in process_prompt
    resp = await self._process_prompt(base_shard, prompt, image_str, request_id, inference_state)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex/exo/exo/orchestration/standard_node.py", line 134, in _process_prompt
    result, inference_state, is_finished = await self.inference_engine.infer_prompt(request_id, shard, prompt, image_str, inference_state=inference_state)
                                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex/exo/exo/inference/tinygrad/inference.py", line 67, in infer_prompt
    h = await asyncio.get_event_loop().run_in_executor(self.executor, lambda: self.model(Tensor([toks]), start_pos, TEMPERATURE).realize())
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex/exo/exo/inference/tinygrad/inference.py", line 67, in <lambda>
    h = await asyncio.get_event_loop().run_in_executor(self.executor, lambda: self.model(Tensor([toks]), start_pos, TEMPERATURE).realize())
                                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex/exo/exo/inference/tinygrad/models/llama.py", line 214, in __call__
    return self.forward(tokens, start_pos, temperature, top_k, top_p, alpha_f, alpha_p)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex/exo/exo/inference/tinygrad/models/llama.py", line 193, in forward
    mask = Tensor.full((1, 1, seqlen, start_pos + seqlen), float("-100000000"), dtype=x.dtype, device=x.device).triu(start_pos + 1).realize() if seqlen > 1 
else None
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex/exo/.venv/lib/python3.12/site-packages/tinygrad/tensor.py", line 3414, in _wrapper
    ret = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/Users/alex/exo/.venv/lib/python3.12/site-packages/tinygrad/tensor.py", line 208, in realize
    run_schedule(*self.schedule_with_vars(*lst), do_update_stats=do_update_stats)
  File "/Users/alex/exo/.venv/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 221, in run_schedule
    for ei in lower_schedule(schedule):
  File "/Users/alex/exo/.venv/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 214, in lower_schedule
    raise e
  File "/Users/alex/exo/.venv/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 208, in lower_schedule
    try: yield lower_schedule_item(si)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex/exo/.venv/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 192, in lower_schedule_item
    runner = get_runner(si.outputs[0].device, si.ast)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex/exo/.venv/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 157, in get_runner
    prg: Program = get_kernel(Device[dname].renderer, ast).to_program()
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex/exo/.venv/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 31, in get_kernel
    k = beam_search(kb, rawbufs, BEAM.value, bool(getenv("BEAM_ESTIMATE", 1)))
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/alex/exo/.venv/lib/python3.12/site-packages/tinygrad/engine/search.py", line 151, in beam_search
    for i,proc in (map(_compile_fn, enumerate(acted_lins)) if beam_pool is None else beam_pool.imap_unordered(_compile_fn, enumerate(acted_lins))):
  File "/Users/alex/exo/.venv/lib/python3.12/site-packages/tinygrad/engine/search.py", line 60, in _try_compile_linearized_w_idx
    signal.signal(signal.SIGALRM, timeout_handler)
  File "/opt/homebrew/Cellar/[email protected]/3.12.4/Frameworks/Python.framework/Versions/3.12/lib/python3.12/signal.py", line 58, in signal
    handler = _signal.signal(_enum_to_int(signalnum), _enum_to_int(handler))
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
ValueError: signal only works in main thread of the main interpreter

AlexCheema avatar Sep 05 '24 16:09 AlexCheema

This more generally looks like a race condition that can happen with other BEAM levels. It's because of the way tinygrad uses signals.

AlexCheema avatar Sep 06 '24 00:09 AlexCheema