exo clang issues on ubuntu 24.04 and Python 3.12

Screenshot from 2024-11-14 08-18-52

After installing exo and clang on an Ubuntu 24.04 machine with a Ryzen CPU I got an error while trying to run a prompt. (See attached image.) Anyone have any idea what might be going on?

Nov 14 '24 13:11 tunguz

I got this same issue myself. Currently trying my hand at diagnosing what might be the cause. I'm guess it's something to do with the fact that it's running in a venv with python there.

I'm on a beefy setup so I know it's not a resource constraint issue (2 1080 Ti)

Nov 14 '24 19:11 devinatkin

@devinatkin I don't think its the venv issue. I am running it on bare metal Ubuntu 24.04, which comes with Python 3.12 as the default system Python.

Nov 14 '24 19:11 tunguz

@devinatkin I don't think its the venv issue. I am running it on bare metal Ubuntu 24.04, which comes with Python 3.12 as the default system Python.

Well that's good to know I'm on a fresh install of Ubuntu 24.04.1 LTS and decided to try with just the 1 machine pretty good machine before adding the rest of the junk heap.

Nov 14 '24 19:11 devinatkin

Error processing prompt: Command '['clang', '-shared', '-march=native', '-O2', '-Wall', '-Werror', '-x', 'c', '-fPIC', '-ffreestanding', '-nostdlib', '-', '-o', '/tmp/tmp8o4ea_pa']' returned non-zero exit status 1.
Traceback (most recent call last):
  File "/home/dmatkin/exo/exo/main.py", line 193, in run_model_cli
    await node.process_prompt(shard, prompt, request_id=request_id)
  File "/home/dmatkin/exo/exo/orchestration/standard_node.py", line 166, in process_prompt
    resp = await self._process_prompt(base_shard, prompt, request_id)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dmatkin/exo/exo/orchestration/standard_node.py", line 198, in _process_prompt
    result = await self.inference_engine.infer_prompt(request_id, shard, prompt)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dmatkin/exo/exo/inference/inference_engine.py", line 28, in infer_prompt
    tokens = await self.encode(shard, prompt)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dmatkin/exo/exo/inference/tinygrad/inference.py", line 76, in encode
    await self.ensure_shard(shard)
  File "/home/dmatkin/exo/exo/inference/tinygrad/inference.py", line 99, in ensure_shard
    model_shard = await loop.run_in_executor(self.executor, build_transformer, model_path, shard, parameters)
                  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/concurrent/futures/thread.py", line 58, in run
    result = self.fn(*self.args, **self.kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dmatkin/exo/exo/inference/tinygrad/inference.py", line 59, in build_transformer
    load_state_dict(model, weights, strict=False, consume=False)  # consume=True
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dmatkin/exo/.venv/lib/python3.12/site-packages/tinygrad/nn/state.py", line 129, in load_state_dict
    else: v.replace(state_dict[k].to(v.device)).realize()
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dmatkin/exo/.venv/lib/python3.12/site-packages/tinygrad/tensor.py", line 3500, in _wrapper
    ret = fn(*args, **kwargs)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/dmatkin/exo/.venv/lib/python3.12/site-packages/tinygrad/tensor.py", line 213, in realize
    run_schedule(*self.schedule_with_vars(*lst), do_update_stats=do_update_stats)
  File "/home/dmatkin/exo/.venv/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 222, in run_schedule
    for ei in lower_schedule(schedule):
  File "/home/dmatkin/exo/.venv/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 215, in lower_schedule
    raise e
  File "/home/dmatkin/exo/.venv/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 209, in lower_schedule
    try: yield lower_schedule_item(si)
               ^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dmatkin/exo/.venv/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 193, in lower_schedule_item
    runner = get_runner(si.outputs[0].device, si.ast)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dmatkin/exo/.venv/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 162, in get_runner
    method_cache[ckey] = method_cache[bkey] = ret = CompiledRunner(replace(prg, dname=dname))
                                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dmatkin/exo/.venv/lib/python3.12/site-packages/tinygrad/engine/realize.py", line 84, in __init__
    self.lib:bytes = precompiled if precompiled is not None else Device[p.dname].compiler.compile_cached(p.src)
                                                                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/dmatkin/exo/.venv/lib/python3.12/site-packages/tinygrad/device.py", line 183, in compile_cached
    lib = self.compile(src)
          ^^^^^^^^^^^^^^^^^
  File "/home/dmatkin/exo/.venv/lib/python3.12/site-packages/tinygrad/runtime/ops_clang.py", line 15, in compile
    subprocess.check_output(['clang', '-shared', *self.args, '-O2', '-Wall', '-Werror', '-x', 'c', '-fPIC', '-ffreestanding', '-nostdlib',
  File "/usr/lib/python3.12/subprocess.py", line 466, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3.12/subprocess.py", line 571, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '['clang', '-shared', '-march=native', '-O2', '-Wall', '-Werror', '-x', 'c', '-fPIC', '-ffreestanding', '-nostdlib', '-', '-o', '/tmp/tmp8o4ea_pa']' returned non-zero exit status 1.
Received exit signal SIGTERM...

Trying to launch with a run command seems to deliver the same type of issue.

Nov 14 '24 19:11 devinatkin

@devinatkin I don't think its the venv issue. I am running it on bare metal Ubuntu 24.04, which comes with Python 3.12 as the default system Python.

Well that's good to know I'm on a fresh install of Ubuntu 24.04.1 LTS and decided to try with just the 1 machine pretty good machine before adding the rest of the junk heap.

Yup, I am using a brand new machine with a brand new Ubuntu install. This was pretty much the first thing I had tried on it.

Nov 14 '24 19:11 tunguz

+1 I was getting this as well. Thought maybe the clang version wasn't compatible with the current tinygrad implementation so tried clang 14 and 16 but couldn't get it to work.

Nov 14 '24 21:11 cadenmackenzie

adding -v to clang, you can probably see the actual error: clang -v -include tgmath.h -shared -march=native -O2 -Wall -Werror -x c -fPIC - -o /tmp/tefsd Ubuntu clang version 14.0.0-1ubuntu1.1 TLDNR removed on-dir=/root/exo -ferror-limit 19 -fgnuc-version=4.2.1 -fcolor-diagnostics -vectorize-loops -vectorize-slp -faddrsig -D__GCC_HAVE_DWARF2_CFI_ASM=1 -o /tmp/--368d98.o -x c - clang -cc1 version 14.0.0 based upon LLVM 14.0.0 default target x86_64-pc-linux-gnu ignoring nonexistent directory "/usr/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../x86_64-linux-gnu/include" ignoring nonexistent directory "/include"

Basically some kind of path issue as it has a repeating component, that shouldn't be repeating. it is compiling from pipe, not a file, so I can't debug further

Nov 15 '24 14:11 cnukaus

Basically some kind of path issue as it has a repeating component, that shouldn't be repeating. it is compiling from pipe, not a file, so I can't debug further

Would it be possible to overcome this by a different kind of clang installation?

Nov 15 '24 16:11 tunguz

I'm having other clang problems trying to get a docker image together (Dockerfile if curious) so this might just change from the issue you're seeing to the one I'm seeing, but to the question of trying other clang installation methods...

https://apt.llvm.org/ for nightly builds, other versions, etc...

would be curious to hear if you get things working with a different version / install.

Nov 16 '24 22:11 jonstelly

@blindcrone you're running this on your linux box right? Could you take a look at what might be the issue here? Thanks!

Nov 18 '24 17:11 AlexCheema

I've only got the tinygrad backend working on linux machines that have GPUs. I chased this rabbithole a bit a few weeks ago and found that this is an issue in tinygrad in general, as I've yet to find any report of a linux user being able to use the clang backend on tinygrad for llama and related models. Tinygrad contains a "fix-bf16"-alike function that also doesn't seem to solve the issue.

The actual bug is happening in LLVM when trying to support float16 types, and is an issue I was able to chase down in that repository, I'll look for it again and find links to post here, but the tl;dr is that this might be patched in LLVM 19 but no distro I know of currently packages it because of build issues

Nov 19 '24 02:11 blindcrone

@blindcrone may be try on archlinux? https://archlinux.pkgs.org/rolling/mesa-git-x86_64/clang-git-20.0.0_r516907.e102338b6e2f-1-x86_64.pkg.tar.zst.html

Nov 20 '24 18:11 lexasub

@blindcrone i install it(clang 20), but is not fixed

Nov 20 '24 19:11 lexasub

Yea, I'm on arch and I haven't gotten newer llvm to cleanly install (Probably I have stuff that depends on old versions, or have the wrong compilers to build it), so if it doesn't work then there goes that theory

I think I'll just write another inference engine that supports CPU. Been digging all around that code anyway

Nov 21 '24 01:11 blindcrone

@blindcrone after clean ccache cache, i another problem, exo didn't use gpu (cluster mode - 2 linux machines, rtx 3060). and i don't get answer in ui log.log

Nov 21 '24 07:11 lexasub

I'm attempting this on aarch64(raspberry pi 5). With stock clang 14, it's failing with error: __bf16 is not supported on this target. I upgraded to clang 18, which is blowing up in a weirder place:

fatal error: error in backend: Cannot select: 0x555591ddaa60: f16 = fp_round 0x555591ddac90, TargetConstant:i64<0>
  0x555591ddac90: bf16,ch = load<(load (s16) from %ir.7 + 4, !tbaa !4)> 0x555591d763e0, 0x555591dd63e0, undef:i64
    0x555591dd63e0: i64 = add nuw 0x555591dd5960, Constant:i64<4>
      0x555591dd5960: i64 = add 0x555591dd59d0, 0x555591dd5a40
        0x555591dd59d0: i64,ch = CopyFromReg 0x555591d763e0, Register:i64 %3
          0x555591dd5ab0: i64 = Register %3
        0x555591dd5a40: i64 = shl nuw nsw 0x555591dd57a0, Constant:i64<3>
          0x555591dd57a0: i64,ch = CopyFromReg 0x555591d763e0, Register:i64 %0
            0x555591dd5810: i64 = Register %0
          0x555591dd5b20: i64 = Constant<3>
      0x555591dd61b0: i64 = Constant<4>
    0x555591dd58f0: i64 = undef
  0x555591dd5c00: i64 = TargetConstant<0>
In function: E_4194304_4
clang-18: error: clang frontend command failed with exit code 70 (use -v to see invocation)
Debian clang version 18.1.8 (++20240731024826+3b5b5c1ec4a3-1~exp1~20240731144843.145)

With clang 19, it builds and executes okay, but I still don't have it working because i'm getting socket errors immediately after that that I haven't debugged yet.

Nov 22 '24 02:11 kdkd

I believe I am also experiencing this error. I am on Debian 12 (bookworm) with Python 3.12.7 and clang version 14.0.6

Is there a "known good" combination of distro and python/clang versions that work? I have been testing out my own version of a Dockerfile so I can deploy this to multiple systems, but that is also getting the same error.

Nov 25 '24 23:11 Coastline-3102

Tested NOT working using an LXC (proxmox cluster) installed Debian 12, Python 3.11.2, Clang 14.0.6 got same Clang error, 12900h

root@exo-1:~# clang -v
Debian clang version 14.0.6
Target: x86_64-pc-linux-gnu
Thread model: posix
InstalledDir: /usr/bin
Found candidate GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/12
Selected GCC installation: /usr/bin/../lib/gcc/x86_64-linux-gnu/12
Candidate multilib: .;@m64
Selected multilib: .;@m64

root@exo-1:~# python3
Python 3.11.2 (main, Sep 14 2024, 03:00:30) [GCC 12.2.0] on linux

root@exo-1:~# uname -a
Linux exo-1 6.8.12-2-pve #1 SMP PREEMPT_DYNAMIC PMX 6.8.12-2 (2024-09-05T10:03Z) x86_64 GNU/Linux

All default installations, followed the install guide on the README

Nov 28 '24 13:11 DrEVILish

Thanks @DrEVILish

Unfortunately it looks like it is not working on my system. I've updated my Dockerfile to use clang 14, Python-3.11.2 and debian 12 but am still getting the "Failed to fetch completions" error. Notably I am also getting an error that reads

<stdin>:2:44: error: __bf16 is not supported on this target
void E_1048576_4n2(__fp16* restrict data0, __bf16* restrict data1) {
                                           ^
<stdin>:5:5: error: __bf16 is not supported on this target
    __bf16 val0 = *(data1+(alu0+1));
    ^
<stdin>:6:5: error: __bf16 is not supported on this target
    __bf16 val1 = *(data1+(alu0+2));
    ^
<stdin>:7:5: error: __bf16 is not supported on this target
    __bf16 val2 = *(data1+(alu0+3));
    ^
<stdin>:8:5: error: __bf16 is not supported on this target
    __bf16 val3 = *(data1+alu0);
    ^
<stdin>:9:54: error: cannot type-cast from __bf16
    *((__fp164*)((data0+alu0))) = (__fp164){((__fp16)(val3)),((__fp16)(val0)),((__fp16)(val1)),((__fp16)(val2))};
                                                     ^~~~~~
<stdin>:9:71: error: cannot type-cast from __bf16
    *((__fp164*)((data0+alu0))) = (__fp164){((__fp16)(val3)),((__fp16)(val0)),((__fp16)(val1)),((__fp16)(val2))};
                                                                      ^~~~~~
<stdin>:9:88: error: cannot type-cast from __bf16
    *((__fp164*)((data0+alu0))) = (__fp164){((__fp16)(val3)),((__fp16)(val0)),((__fp16)(val1)),((__fp16)(val2))};
                                                                                       ^~~~~~
<stdin>:9:105: error: cannot type-cast from __bf16
    *((__fp164*)((data0+alu0))) = (__fp164){((__fp16)(val3)),((__fp16)(val0)),((__fp16)(val1)),((__fp16)(val2))};
                                                                                                        ^~~~~~

From my initial research, it looks like this has something to do with the BFloat16 support of my CPU, which makes me wonder if maybe the issue is not caused by the python/clang version but instead by BFloat16 support (or lack thereof)? I am seeing some information about using FP16/FP32 instead or software fallbacks, but I am not sure where (or how) to implement those.

The system I have been testing on has a CPU: AMD Ryzen 5 5600X (12) @ 3.700GHz and GPU: AMD ATI Radeon RX 5600 OEM/5600 XT / 5700/5700 XT When I have some time, I can look into testing this one some other systems with different HW to see if I can replicate the issue.

Nov 28 '24 16:11 Coastline-3102

I just did a git pull, and it seems like Exo started working in the cluster on two machines after installation

Nov 28 '24 21:11 lexasub

@lexasub Were those Linux CPU-only machines? What is your setup?

Nov 28 '24 21:11 tunguz

@tunguz, no. rtx 3060 on two machines. @blindcrone, why on cluster mode available only llama?

Nov 28 '24 21:11 lexasub

clang --version clang version 19.1.0 (Fedora 19.1.0-1.fc41) clang version 20.0.0git (Arch)

Nov 28 '24 21:11 lexasub

@lexasub Yeah, exo seems to work fine on GPU Linux computers. I’d like to see it enabled on the CPU-only machines as well. That’s what this issue is all about. :/

Nov 28 '24 21:11 tunguz

@tunguz The first time I wrote here, I encountered the same problem as you.

Nov 28 '24 21:11 lexasub

Yeah, exo seems to work fine on GPU Linux computers

I suspect that might be just NVIDA GPU Linux computers that it works fine on (or my GPU is just not working due to some other unrelated issue). The computer I have been testing on has an AMD GPU.

@lexasub if I am reading your comments right, you are running two separate 3060 computers, had this issue, but were able to fix it? What did you do to fix the issue? Was it just a git pull of the latest version?

@DrEVILish, you mentioned having it working on LXC/proxmox cluster. What CPU/GPU does your system have?

Nov 28 '24 22:11 Coastline-3102

@Coastline-3102, yes, you right. but llama3.1 use small resources on gpu, than peek computation, may be it not solved(, but in htop i didn't see big cpu usage, may be computation was been on gpu

Nov 28 '24 22:11 lexasub

@DrEVILish, you mentioned having it working on LXC/proxmox cluster. What CPU/GPU does your system have?

Sorry, tested, and not working, I've updated my comment above. It was a container without a "GPU" attached just a CPU instance.

Nov 29 '24 21:11 DrEVILish

This Dockerfile as below reproduced errors reported here.

https://github.com/AtelierArith/DocstringTranslationExoBackend.jl/blob/main/docker/Dockerfile

Just run the following command

git clone https://github.com/AtelierArith/DocstringTranslationExoBackend.jl.git
cd DocstringTranslationExoBackend.jl
cd docker
docker build -t exojl -f Dockerfile . &&  docker run --rm -it -p 52415:52415 exojl

Then open your browser and navigate to localhost:52415

Say something.

Dec 02 '24 08:12 terasakisatoshi

from command "clang -v -shared -march=native -O2 -Wall -Werror -x c -fPIC -ffreestanding -nostdlib - -o /tmp/tmp7lf4ks67". It looks waiting input. I don't know what input it wait for?

Dec 03 '24 08:12 macauleycheng