fastllm
fastllm copied to clipboard
纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行
我按照README完成ANDROID编译,拷贝benchmark到termux, 跑chatglm2-int4.flm 速度只有1.3t/s 我也试了直接在termux里面编译,直接cmake, 速度也是一样 我的手机是Snapdragon 8 gen 2, t=4, NDK r24 r25c都试过了
substract -> subtract
cmake 3.24.3 gcc 9.5.0 cuda 10.2 cmake .. -DUSE_CUDA=ON ok,有warning: CUDA_ARCHITECTURE is empty for target benchmark/fastllm-tool/... make -j出现报错: fastllm-master/src/fastllm.cpp: In member function ‘void fastllm::Data::CalcWeightSum()’: fastllm-master/src/fastllm.cpp:495:31: error: ‘_mm256_set_m128i’ was not declared...
FastLLM Error: Linear's weight's shape error 请问这是什么问题呢?fp16的可以正常运行。
cuda11.7 cudnn8 运行llm.from_hf转换chatglm2-6B模型报 convert ( 200 / 200 ) Warmup... status = 15 1 16 128 Error: cublas error. Aborted
-- USE_CUDA: ON -- PYTHON_API: OFF -- CMAKE_CXX_FLAGS -pthread --std=c++17 -O2 -march=native CMake Error: Could not find cmake module file: CMakeDetermineCUDACompiler.cmake CMake Error: Error required internal CMake variable not set,...
(pytorch1.13) user@user:~/zx/ChatGLM2-6B$ CUDA_VISIBLE_DEVICES=3 python api_jiasu.py -p ./model.flm llm model: chatglm Load (200 / 200) Warmup... Segmentation fault (core dumped) 模型导出成功,导出后加载时报错