ollama
ollama copied to clipboard
Emerald Rapids host - AMX instruction not recognized
Not sure if AMX is being used. I expected the line below to have some indication of AMX or other relevant CPU instructions
system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=72
Some snippet of /proc/cpuinfo: ` model name : INTEL(R) XEON(R) GOLD 6554S
perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 tdx_host_platform cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect user_shstk avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req hfi vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd sgx_lc fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb ept_5level flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_mode_based_exec tsc_scaling usr_wait_pause notify_vm_exiting ipi_virt bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb tdx_pw_mce bhi `
If I run llama.cpp directly, following line is printed:
system_info: n_threads = 72 (n_threads_batch = 72) / 144 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | AMX_INT8 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
The log is from version 0.5.7 of ollama, which only compiles runners with AVX/AVX2 extensions.
Should the log at least show AVX/AVX2 instruction in the system info line?
How can AMX runner be compiled?
0.5.8+ starts a basic CPU runner that then dynamically loads libraries for dealing with extended CPU architectures or GPUs. If you set OLLAMA_DEBUG=1 in the server environment and post the resulting logs, it will show what library was dynamically loaded.
Here's another try, this time with 0.5.11. I don't see any change.
(base) jays@m50fcp-1:~/data/ollama_build/ollama$ git log
commit f8453e9d4a15f5f54b610993e8647d252cb65626 (grafted, HEAD, tag: v0.5.11)
Author: Jeffrey Morgan <[email protected]>
Date: Thu Feb 13 22:37:59 2025 -0800
llm: attempt to evaluate symlinks, but do not fail (#9089)
provides a better approach to #9088 that will attempt to
evaluate symlinks (important for macOS where 'ollama' is
often a symlink), but use the result of os.Executable()
as a fallback in scenarios where filepath.EvalSymlinks
fails due to permission erorrs or other issues
(base) jays@m50fcp-1:~/data/ollama_build/ollama$
time=2025-02-14T21:48:31.757-05:00 level=INFO source=runner.go:936 msg="starting go runner"
time=2025-02-14T21:48:31.757-05:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=72
time=2025-02-14T21:48:31.757-05:00 level=DEBUG source=ggml.go:89 msg="ggml backend load all from path" path=/mnt/data/ollama_build/ollama
It didn't find any dynamic libraries to load. It looks like you are running the command from /home/jays/data/ollama_build/ollama yet the build path was /mnt/data/ollama_build/ollama. Do you have fancy symlinks/mount points that might be confusing ollama about where to find the libraries?
/home/jays/data is a symlink to /mnt/data/
(base) jays@m50fcp-1:~/data/ollama_build/ollama$ md5sum /mnt/data/ollama_build/ollama/ollama
6dd3ce2d15ed0b70c68c99b066f12299 /mnt/data/ollama_build/ollama/ollama
(base) jays@m50fcp-1:~/data/ollama_build/ollama$ md5sum /home/jays/data/ollama_build/ollama/ollama
6dd3ce2d15ed0b70c68c99b066f12299 /home/jays/data/ollama_build/ollama/ollama
(base) jays@m50fcp-1:~/data/ollama_build/ollama$
Some progress to report... I had cloned the repo, followed by go build .
But that is not sufficient. Based on @rick-github your hint about not finding the libraries, I built the libraries by doing:
mkdir build
cd build
cmake ../
make
Then, running OLLAMA_DEBUG=1 ./ollama serve I see AMX being utilized.
A few observations/questions:
- The inference results appear faster now than previously but gradually slow down. This behavior wasn't observed using llama.cpp directly. Might be a quantization issue as some layers of the model are not utilizing AMX
- Is there a way to print inference statistics like llama.cpp does when app is shutdown?
https://github.com/ollama/ollama/pull/9203
- Can you quantify "slow down?"
ollama --verboseis the best that ollama can offer.