fastllm issues

18

1. 在Ubuntu 20.4的Docker容器上编译安装 2. GCC 9和GCC 11版本都试过了，一样的报错 3. 机器是老的Dell R720，可能CPU比较老麻烦帮忙看下 ``` In file included from /usr/lib/gcc/x86_64-linux-gnu/11/include/immintrin.h:47, from /root/app/fastllm/include/utils/utils.h:21, from /root/app/fastllm/src/fastllm.cpp:5: /usr/lib/gcc/x86_64-linux-gnu/11/include/avx2intrin.h: In member function 'void fastllm::Data::CalcWeightSum()': /usr/lib/gcc/x86_64-linux-gnu/11/include/avx2intrin.h:119:1: error: inlining...

wangyumu

support the tokp topk and penalty logic

3

The current code lacks support for the tokp topk penalty mechanism. I kindly request you to consider adding it. Thank you.

junior-zsy

谁有跟llama.cpp /ggml做过性能对比？

12

如题

JianbangZ

如何设置context window大小

1

比如512, 2k, 8k

JianbangZ

quant 方法

3

请问是否有计划实现类似ggml采取更加灵活的量化方法，如Q4_1, q3_k_m

JianbangZ

量化后的模型如何选择GPU/CPU推理？

1

如标题，在测试时通过GPU和CPU加载模型转换为flm模型后，推理速度几乎一样。

Alone749-i

quant转换模型出现的问题

1

在执行 ./quant -p chatglm-6b-fp32.flm -o chatglm-6b-fp16.flm -b 16出现以下问题 FastLLM Error: Unkown model type: unknown terminate called after throwing an instance of 'std::string' Aborted (core dumped)

White-Friday