A770 can't run deepseek R1 Q4 with flashmoe
Describe the bug A770 can't run deepseek R1 Q4 with flashmoe
How to reproduce Steps to reproduce the error:
- install the gpu driver following the instruction (https://dgpu-docs.intel.com/driver/client/overview.html)
- download the gguf [DeepSeek-R1-Q4_K_M.gguf] which includs 9 files.
- ./flash-moe -m /PATH/TO/DeepSeek-R1-Q4_K_M-00001-of-00009.gguf --prompt "What's AI?" -no-cnv
- then there is the error message:
Screenshots
./flash-moe -m /home/deepseek/文档/deepseek/DeepSeek-R1-Q4_K_M-00001-of-00009.gguf --prompt "What's AI?" -no-cnv terminate called after throwing an instance of 'std::filesystem::__cxx11::filesystem_error' what(): filesystem error: Cannot convert character sequence: Invalid or incomplete multibyte or wide character ./flash-moe: 第 25 行: 8026 已中止 (核心已转储) LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(cd "$(dirname "$0")";pwd) $(cd "$(dirname "$0")";pwd)/llama-cli-bin -t $CORES -e -ngl 999 --color --no-context-shift -ot exps=CPU "$@"
Environment information ubuntu 22.04.05 384G ddr5 2 A770 gpus
Additional context Add any other context about the problem here.
Hi @luningxie , it looks like a path related error. Could you please try to put your DeepSeek model to a full English path (like /home/deepseek/deepseek/DeepSeek-R1-Q4_K_M-00001-of-00009.gguf) and try flashmoe again ?
thank you for the help. prob solved and there is a new one:
OMP: Warning #65: KMP_AFFINITY: syntax error, not using affinity. OMP: Warning #62: KMP_AFFINITY: proclist not specified with explicit affinity type, using "none". main: llama threadpool init, n_threads = 96
system_info: n_threads = 96 (n_threads_batch = 96) / 96 | CPU : SSE3 = 1 | SSSE3 = 1 | AVX = 1 | AVX_VNNI = 1 | AVX2 = 1 | F16C = 1 | FMA = 1 | AVX512 = 1 | AVX512_VBMI = 1 | AVX512_VNNI = 1 | AVX512_BF16 = 1 | AMX_INT8 = 1 | LLAMAFILE = 1 | OPENMP = 1 | AARCH64_REPACK = 1 |
sampler seed: 2710037957 sampler params: repeat_last_n = 64, repeat_penalty = 1.000, frequency_penalty = 0.000, presence_penalty = 0.000 dry_multiplier = 0.000, dry_base = 1.750, dry_allowed_length = 2, dry_penalty_last_n = 4096 top_k = 40, top_p = 0.950, min_p = 0.050, xtc_probability = 0.000, xtc_threshold = 0.100, typical_p = 1.000, top_n_sigma = -1.000, temp = 0.800 mirostat = 0, mirostat_lr = 0.100, mirostat_ent = 5.000 sampler chain: logits -> logit-bias -> penalties -> dry -> top-k -> typical -> top-p -> min-p -> xtc -> temp-ext -> dist generate: n_ctx = 4096, n_batch = 4096, n_predict = -1, n_keep = 1
What's AI? ArtificialThe program was built for 1 devices Build program log for 'Intel(R) Arc(TM) A770 Graphics':
Exception caught at file:/home/intel/yina/llama-cpp-bigdl/ggml/src/ggml-sycl/ggml-sycl.cpp, line:3234, func:operator() SYCL error: CHECK_TRY_ERROR(op(ctx, src0, src1, dst, src0_dd_i, src1_ddf_i, src1_ddq_i, dst_dd_i, dev[i].row_low, dev[i].row_high, src1_ncols, src1_padded_col_size, stream)): Meet error in this line code! in function ggml_sycl_op_mul_mat at /home/intel/yina/llama-cpp-bigdl/ggml/src/ggml-sycl/ggml-sycl.cpp:3234 /home/intel/yina/llama-cpp-bigdl/ggml/src/ggml-sycl/../ggml-sycl/common.hpp:117: SYCL error OMP: Warning #65: KMP_AFFINITY: syntax error, not using affinity. OMP: Warning #62: KMP_AFFINITY: proclist not specified with explicit affinity type, using "none". Could not attach to process. If your uid matches the uid of the target process, check the setting of /proc/sys/kernel/yama/ptrace_scope, or try again as the root user. For more details, see /etc/sysctl.d/10-ptrace.conf ptrace: 不允许的操作. No stack. The program is not being run. ./flash-moe: 第 25 行: 5071 已中止 (核心已转储) LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$(cd "$(dirname "$0")";pwd) $(cd "$(dirname "$0")";pwd)/llama-cli-bin -t $CORES -e -ngl 999 --color --no-context-shift -ot exps=CPU "$@"
Hi @luningxie , based on your error message, it seems there exists two issues.
1. OMP: Warning https://github.com/intel/ipex-llm/pull/65: KMP_AFFINITY: syntax error, not using affinity.
This is most likely a failure of CORES recognition.
In flashmoe script, we use CORES=$(lscpu | grep "Core(s) per socket:" | awk '{print $4}') to find the cores number, maybe your system is not English so that it can not obtain the right number.
For this issue, you can manually change the value of CORES in flashmoe script.
2. The program was built for 1 devices Build program log for 'Intel(R) Arc(TM) A770 Graphics':
I am not very sure what caused this error. Based on my experience, it's more likely caused by wrong oneapi libs. Could you please make sure you have not manually source oneapi or not in any conda environment when you are running flashmoe ?
Thank you very much!
I have formatted the pc and reinstalled ubuntu 22.04.02. Now the "KMP_AFFINITY: syntax error, not using affinity" is solved.
but "The program was built for 1 devices Build program log for 'Intel(R) Arc(TM) A770 Graphics'" problem remained the same.
not in any conda env, neither manually source oneapi. Only install the gpu driver following the instruction (https://dgpu-docs.intel.com/driver/client/overview.html)
Hi @luningxie , sadly we never got this error on our machine when running flashmoe cli + DeepSeek Q4K. Are you using the latest version flashmoe (like llama-cpp-ipex-llm-2.3.0b20250430-ubuntu-core.tgz or llama-cpp-ipex-llm-2.3.0b20250430-ubuntu-xeon.tgz) ? And could you please provide us with more detailed machine information with this script and full running log?
Thank you very much!
I have formatted the pc and reinstalled ubuntu 22.04.02. Now the "KMP_AFFINITY: syntax error, not using affinity" is solved.
but "The program was built for 1 devices Build program log for 'Intel(R) Arc(TM) A770 Graphics'" problem remained the same.
not in any conda env, neither manually source oneapi. Only install the gpu driver following the instruction (https://dgpu-docs.intel.com/driver/client/overview.html)
This error is caused by 1.3.30049.10-950~22.04 intel-level-zero-gpu. Please downgrade these libs with following commands:
wget -qO - https://repositories.intel.com/gpu/intel-graphics.key | \
sudo gpg --yes --dearmor --output /usr/share/keyrings/intel-graphics.gpg
echo "deb [arch=amd64 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy/lts/2350 unified" | \
sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list
sudo apt update
sudo apt install -y intel-i915-dkms=1.23.10.92.231129.101+i141-1
echo "deb [arch=amd64,i386 signed-by=/usr/share/keyrings/intel-graphics.gpg] https://repositories.intel.com/gpu/ubuntu jammy unified" | \
sudo tee /etc/apt/sources.list.d/intel-gpu-jammy.list
sudo apt update
sudo apt install -y intel-level-zero-gpu=1.3.29735.27-914~22.04
sudo apt install -y level-zero=1.14.0-744~22.04 level-zero-dev=1.14.0-744~22.04
sudo apt install -y xpu-smi=1.2.33-52~22.04
Thank you very much!
I have formatted the pc and reinstalled ubuntu 22.04.02. Now the "KMP_AFFINITY: syntax error, not using affinity" is solved.
but "The program was built for 1 devices Build program log for 'Intel(R) Arc(TM) A770 Graphics'" problem remained the same.
not in any conda env, neither manually source oneapi. Only install the gpu driver following the instruction (https://dgpu-docs.intel.com/driver/client/overview.html)
Hi @luningxie, you can try our latest portable zip from the following link: https://github.com/ipex-llm/ipex-llm/releases/download/v2.3.0-nightly/llama-cpp-ipex-llm-2.3.0b20250611-ubuntu-xeon.tgz. The built for 1 devices error is supposed to be fixed in this version.
嗨,根据您的错误消息,似乎存在两个问题。
1. OMP:警告 #65:KMP_AFFINITY:语法错误,未使用 affinity。
这很可能是 CORES 识别失败。在 flashmoe 脚本中,我们用来查找核心数,可能你的系统不是英文的,所以无法获取到正确的数字。对于此问题,您可以手动更改 in flashmoe 脚本 的值。
CORES=$(lscpu | grep "Core(s) per socket:" | awk '{print $4}')``CORES2. 该程序是为 1 台设备构建的 为“Intel(R) Arc(TM) A770 Graphics”构建程序日志:
我不太确定是什么导致了这个错误。根据我的经验,这更有可能是由错误的 oneapi 库引起的。您能否确保在运行 flashmoe 时没有在任何 conda 环境中手动获取 oneapi?
确实是中文的问题,操作系统改为英文就没有这个错误了,并且能指定CPU运行