ollama Emerald Rapids host - AMX instruction not recognized

Not sure if AMX is being used. I expected the line below to have some indication of AMX or other relevant CPU instructions

system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=72

Some snippet of /proc/cpuinfo: ` model name : INTEL(R) XEON(R) GOLD 6554S

perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l3 cat_l2 cdp_l3 tdx_host_platform cdp_l2 ssbd mba ibrs ibpb stibp ibrs_enhanced tpr_shadow flexpriority ept vpid ept_ad fsgsbase tsc_adjust sgx bmi1 avx2 smep bmi2 erms invpcid cqm rdt_a avx512f avx512dq rdseed adx smap avx512ifma clflushopt clwb intel_pt avx512cd sha_ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves cqm_llc cqm_occup_llc cqm_mbm_total cqm_mbm_local split_lock_detect user_shstk avx_vnni avx512_bf16 wbnoinvd dtherm ida arat pln pts hwp hwp_act_window hwp_epp hwp_pkg_req hfi vnmi avx512vbmi umip pku ospke waitpkg avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq la57 rdpid bus_lock_detect cldemote movdiri movdir64b enqcmd sgx_lc fsrm md_clear serialize tsxldtrk pconfig arch_lbr ibt amx_bf16 avx512_fp16 amx_tile amx_int8 flush_l1d arch_capabilities vmx flags : vnmi preemption_timer posted_intr invvpid ept_x_only ept_ad ept_1gb ept_5level flexpriority apicv tsc_offset vtpr mtf vapic ept vpid unrestricted_guest vapic_reg vid ple shadow_vmcs pml ept_mode_based_exec tsc_scaling usr_wait_pause notify_vm_exiting ipi_virt bugs : spectre_v1 spectre_v2 spec_store_bypass swapgs eibrs_pbrsb tdx_pw_mce bhi `

ollama.log

Feb 14 '25 16:02 js333031

Feb 14 '25 16:02 js333031

The log is from version 0.5.7 of ollama, which only compiles runners with AVX/AVX2 extensions.

Feb 14 '25 20:02 rick-github

Should the log at least show AVX/AVX2 instruction in the system info line?

How can AMX runner be compiled?

Feb 14 '25 21:02 js333031

0.5.8+ starts a basic CPU runner that then dynamically loads libraries for dealing with extended CPU architectures or GPUs. If you set OLLAMA_DEBUG=1 in the server environment and post the resulting logs, it will show what library was dynamically loaded.

Feb 14 '25 21:02 rick-github

Here's another try, this time with 0.5.11. I don't see any change.

(base) jays@m50fcp-1:~/data/ollama_build/ollama$ git log
commit f8453e9d4a15f5f54b610993e8647d252cb65626 (grafted, HEAD, tag: v0.5.11)
Author: Jeffrey Morgan <[email protected]>
Date:   Thu Feb 13 22:37:59 2025 -0800

    llm: attempt to evaluate symlinks, but do not fail (#9089)

    provides a better approach to #9088 that will attempt to
    evaluate symlinks (important for macOS where 'ollama' is
    often a symlink), but use the result of os.Executable()
    as a fallback in scenarios where filepath.EvalSymlinks
    fails due to permission erorrs or other issues
(base) jays@m50fcp-1:~/data/ollama_build/ollama$

ollama.0.5.11.log

Feb 15 '25 02:02 js333031

time=2025-02-14T21:48:31.757-05:00 level=INFO source=runner.go:936 msg="starting go runner"
time=2025-02-14T21:48:31.757-05:00 level=INFO source=runner.go:937 msg=system info="CPU : LLAMAFILE = 1 | CPU : LLAMAFILE = 1 | cgo(gcc)" threads=72
time=2025-02-14T21:48:31.757-05:00 level=DEBUG source=ggml.go:89 msg="ggml backend load all from path" path=/mnt/data/ollama_build/ollama

It didn't find any dynamic libraries to load. It looks like you are running the command from /home/jays/data/ollama_build/ollama yet the build path was /mnt/data/ollama_build/ollama. Do you have fancy symlinks/mount points that might be confusing ollama about where to find the libraries?

Feb 15 '25 12:02 rick-github

/home/jays/data is a symlink to /mnt/data/

(base) jays@m50fcp-1:~/data/ollama_build/ollama$ md5sum /mnt/data/ollama_build/ollama/ollama
6dd3ce2d15ed0b70c68c99b066f12299  /mnt/data/ollama_build/ollama/ollama
(base) jays@m50fcp-1:~/data/ollama_build/ollama$ md5sum /home/jays/data/ollama_build/ollama/ollama
6dd3ce2d15ed0b70c68c99b066f12299  /home/jays/data/ollama_build/ollama/ollama
(base) jays@m50fcp-1:~/data/ollama_build/ollama$

Feb 15 '25 15:02 js333031

ollama.0.5.11-amx.log

Some progress to report... I had cloned the repo, followed by go build .

But that is not sufficient. Based on @rick-github your hint about not finding the libraries, I built the libraries by doing:

mkdir build
cd build
cmake ../
make

Then, running OLLAMA_DEBUG=1 ./ollama serve I see AMX being utilized.

A few observations/questions:

The inference results appear faster now than previously but gradually slow down. This behavior wasn't observed using llama.cpp directly. Might be a quantization issue as some layers of the model are not utilizing AMX
Is there a way to print inference statistics like llama.cpp does when app is shutdown?

Feb 15 '25 15:02 js333031

https://github.com/ollama/ollama/pull/9203

Can you quantify "slow down?"
ollama --verbose is the best that ollama can offer.

Feb 20 '25 16:02 rick-github

ollama ollama copied to clipboard

Emerald Rapids host - AMX instruction not recognized

ollama
ollama copied to clipboard