fastllm icon indicating copy to clipboard operation
fastllm copied to clipboard

纯c++的全平台llm加速库,支持python调用,chatglm-6B级模型单卡可达10000+token / s,支持glm, llama, moss基座,手机端流畅运行

Results 170 fastllm issues
Sort by recently updated
recently updated
newest added

1. 在Ubuntu 20.4的Docker容器上编译安装 2. GCC 9和GCC 11版本都试过了,一样的报错 3. 机器是老的Dell R720,可能CPU比较老 麻烦帮忙看下 ``` In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:47, from /root/app/fastllm/include/utils/utils.h:21, from /root/app/fastllm/src/fastllm.cpp:5: /usr/lib/gcc/x86_64-linux-gnu/11/include/avx2intrin.h: In member function 'void fastllm::Data::CalcWeightSum()': /usr/lib/gcc/x86_64-linux-gnu/11/include/avx2intrin.h:119:1: error: inlining...

The current code lacks support for the tokp topk penalty mechanism. I kindly request you to consider adding it. Thank you.

请问是否有计划实现类似ggml采取更加灵活的量化方法,如Q4_1, q3_k_m

如标题,在测试时通过GPU和CPU加载模型转换为flm模型后,推理速度几乎一样。

在执行 ./quant -p chatglm-6b-fp32.flm -o chatglm-6b-fp16.flm -b 16出现以下问题 FastLLM Error: Unkown model type: unknown terminate called after throwing an instance of 'std::string' Aborted (core dumped)