chatglm.cpp chatglm-ggml_q4_0.bin GGML_ASSERT ggml-metal.m:1453: false

./build/bin/main -m ../GGUF_Models/chatglm-ggml_q4_0.bin -l 256 -p "你好" GGML_ASSERT: /Users/apple/PycharmProjects/NLPProject/chatglm.cpp/third_party/ggml/src/ggml-metal.m:1453: false

Nov 10 '23 01:11 zwqjoy

产生报错的环境是 mac + metal 吗？

Dec 18 '23 03:12 Weaxs

同样遇到了这个问题。输出log如下： ggml_metal_graph_compute: command buffer 0 failed with status 5

M1 pro 16GB，chatglm3-f16使用mps后端就导致了这个。

系统是Sonoma 14.2.

Dec 19 '23 18:12 XuYicong

同样遇到了这个问题。输出log如下： ggml_metal_graph_compute: command buffer 0 failed with status 5

M1 pro 16GB，chatglm3-f16使用mps后端就导致了这个。

系统是Sonoma 14.2.

执行一下 uname -spm 看下，顺便cmake的日志方便贴一下吗

Dec 20 '23 02:12 Weaxs

执行一下 uname -spm 看下，顺便cmake的日志方便贴一下吗

Darwin arm64 arm

cmake在终端的输出：

-- The CXX compiler identification is AppleClang 15.0.0.15000100
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
CMake Deprecation Warning at third_party/ggml/CMakeLists.txt:1 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- The C compiler identification is AppleClang 15.0.0.15000100
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Check for working C compiler: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/cc - skipped
-- Detecting C compile features
-- Detecting C compile features - done
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Success
-- Found Threads: TRUE
-- CMAKE_SYSTEM_PROCESSOR: arm64
-- ARM detected
-- Accelerate framework found
CMake Warning (dev) at third_party/ggml/src/CMakeLists.txt:322 (install):
  Target ggml has RESOURCE files but no RESOURCE DESTINATION.
This warning is for project developers.  Use -Wno-dev to suppress it.

CMake Deprecation Warning at third_party/sentencepiece/CMakeLists.txt:15 (cmake_minimum_required):
  Compatibility with CMake < 3.5 will be removed from a future version of
  CMake.

  Update the VERSION argument <min> value or use a ...<max> suffix to tell
  CMake that the project does not need compatibility with older versions.


-- VERSION: 0.2.00
-- Configuring done (1.1s)
-- Generating done (0.0s)
-- Build files have been written to: /Users/xyct/llm/chatglm.cpp/build

顺便编译警告里有个这个不知道有没有关系：

/Users/xyct/llm/chatglm.cpp/third_party/ggml/src/ggml.c:11895:17: warning: 'cblas_sgemm' is deprecated: first deprecated in macOS 13.3 - An updated CBLAS interface supporting ILP64 is available.  Please compile with -DACCELERATE_NEW_LAPACK to access the new headers and -DACCELERATE_LAPACK_ILP64 for ILP64 support. [-Wdeprecated-declarations]
                cblas_sgemm(CblasRowMajor, CblasNoTrans, CblasTrans,
                ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX14.2.sdk/System/Library/Frameworks/vecLib.framework/Headers/cblas.h:610:6: note: 'cblas_sgemm' has been explicitly marked deprecated here
void cblas_sgemm(const enum CBLAS_ORDER __Order,
     ^

Dec 20 '23 02:12 XuYicong

这个warning应该没关系，方便再发下执行指令和执行log吗

Dec 20 '23 14:12 Weaxs

看 cmake 编译log其实主要想看编译ggml那段的参数，你发的这个没有

Dec 20 '23 14:12 Weaxs

@Weaxs 我删除了build目录重新cmake了一遍，获取了更完整的输出更新在上面了，但好像仍然没有ggml相关的参数。

执行指令和log跟楼主的一样，但我已经删掉模型了所以不方便重新运行了...磁盘空间伤不起

但我觉得原因只是单纯的爆内存了。我用llama.cpp运行qwen-14B-Q4_K，上下文长度较短时一切正常，但设到1000左右就会发生一模一样的错误。之前运行chatglm3-f16时，从敲回车到GGML_ASSERT输出之间，大概有十几秒的等待，期间内存压力也是先攀升到顶后瞬间跌落，肯定是爆内存了。

Update:

通过修改代码，输出[ctx->command_buffers[i] error]，获取到错误信息如下： Insufficient Memory (00000008:kIOGPUCommandBufferCallbackErrorOutOfMemory) 所以错误原因确实是内存不足。

修改方法：void ggml_metal_graph_compute方法开头的ctx->command_buffers[i] = [ctx->queue commandBuffer];改成

        MTLCommandBufferDescriptor* descriptor = [[MTLCommandBufferDescriptor alloc] init];
        descriptor.errorOptions = MTLCommandBufferErrorOptionEncoderExecutionStatus;
        ctx->command_buffers[i] = [ctx->queue commandBufferWithDescriptor:descriptor];
        [descriptor release];

并且在报错行之前插入：

            NSError*error = [ctx->command_buffers[i] error];
            if(error && ([ctx->command_buffers[i] errorOptions] &
                         MTLCommandBufferErrorOptionEncoderExecutionStatus)) {
                GGML_METAL_LOG_INFO("%s", error.localizedDescription.UTF8String);
            }

即可看到错误信息。注：chatglm.cpp未设置GGML_METAL_LOG_INFO的输出回调，可能需要改成printf才有输出。

Dec 20 '23 17:12 XuYicong

chatglm.cpp chatglm.cpp copied to clipboard

chatglm-ggml_q4_0.bin GGML_ASSERT ggml-metal.m:1453: false

Update:

chatglm.cpp
chatglm.cpp copied to clipboard