fastllm issues

make -j编译出错

4

nvcc fatal : Unsupported gpu architecture 'compute_89' make[2]: *** [CMakeFiles/fastllm_tools.dir/build.make:273: CMakeFiles/fastllm_tools.dir/src/devices/cuda/fastllm-cuda.cu.o] Error 1 make[2]: *** Waiting for unfinished jobs.... nvcc fatal : Unsupported gpu architecture 'compute_89' make[2]: *** [CMakeFiles/fastllm.dir/build.make:273: CMakeFiles/fastllm.dir/src/devices/cuda/fastllm-cuda.cu.o]...

DreamTeamWangbowen

编译问题

9

cmake .. -DUSE_CUDA=ON -- The CUDA compiler identification is unknown CMake Error at ..a/share/cmake-3.26/Modules/CMakeDetermineCUDACompiler.cmake:603 (message): Compiler output: Call Stack (most recent call first): CMakeLists.txt:39 (enable_language) -- Configuring incomplete, errors occurred!

chestnut111

能详细介绍一下pyfastllm怎么用吗

cstk2715

/home/jwkj/miniconda3/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found

2

Traceback (most recent call last): File "/home/GC/Gits/Baichuan2/test_fastllm.py", line 5, in from fastllm_pytools import llm File "/home/GC/Gits/Baichuan2/venv/lib/python3.11/site-packages/fastllm_pytools-0.0.1-py3.11.egg/fastllm_pytools/llm.py", line 11, in fastllm_lib = ctypes.cdll.LoadLibrary(os.path.join(os.path.split(os.path.realpath(__file__))[0], "libfastllm_tools.so")) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/jwkj/miniconda3/lib/python3.11/ctypes/__init__.py", line 454, in LoadLibrary...

zxzxde

baichuan2flm.py ..python3.10/site-packages/accelerate/big_modeling.py", line 415, in wrapper raise RuntimeError("You can't move a model that has some modules offloaded to cpu or disk.") RuntimeError: You can't move a model that has some modules offloaded to cpu or disk.

2

调用的是baichuan2flm.py转的是Baichuan2-7B-Chat模型时报错。 /fastllm/build/tools/baichuan2flm.py", line 12, in model.to("cpu") python3.10/site-packages/accelerate/big_modeling.py", line 415, in wrapper raise RuntimeError("You can't move a model that has some modules offloaded to cpu or disk.") RuntimeError: You can't move...

zxzxde

ptuning 训练后的chatGLM2 可以用fastllm吗

4

ptuning 训练后的chatGLM2 可以用fastllm吗，ptuning 好像改变了一些参数，不知道是不是还可以用这个加速

xiaoduozhou

微调后的GhatGLM2-6B模型导出flm格式报错

3

模型导出：model = AutoModel.from_pretrained(xxx) model = llm.from_hf(model, tokenizer, dtype = "float16") model.save(xxx) 模型载入：llm.model(xxx) 报错内容：Segmentation fault (core dumped) 是因为微调后导致的嘛？

Rorschaaaach

Error: cublas error.terminate called after throwing an instance of 'char const*'

4

当提交的文本大一点，遇到了这个问题，模型用的是chatglm2-6b,显卡4090： Error: cublas error. terminate called after throwing an instance of 'char const*' 已放弃 (核心已转储)

lxp521125

qwen模型回复缺字

1

使用qwen模型加速推理回复经常会有缺字的现象例如模型正常回复应该是：1.异常处理 2.单元测试实际上回复是：1. 常处理 2. 元测试一个词缺的字变成了用一个空格代替，请问这是什么原因？是分词没做好还是什么编码问题呢？

hediyuan

为什么chatglm2-6b在P40,cuda 12.1的环境下fastllm加速后performance测试的速度非常低，只有8 tokens / s

19

测试结果: int4量化,1 batch 的速度是8 tokens / s, 只有4090的1/20?🤡🤡🤡 而且fp16的1batch速度反而比int 1batch的速度还高，不是应该要低的吗？另外，16batch的速度都要远低于1batch的速度。这测试结果有点看不懂了，一是为什么16batch比1batch速度反而低, 二是为啥fp16的速度反而比int4速度高, 三是P40的速度为啥只有4090的1/20，两者性能是有差距，但不至于这么大吧？@ztxz16 , 这是哪个环节出了问题了吗？是GPU卡, 模型，还是fastllm中的哪一个出问题了呢？测试标准: 模型 | Data精度 | 平台 | Batch | 最大推理速度(token / s) -- | --...

heavenkiller2018

fastllm
fastllm copied to clipboard

Metadata

make -j编译出错

编译问题

能详细介绍一下pyfastllm怎么用吗

/home/jwkj/miniconda3/bin/../lib/libstdc++.so.6: version `GLIBCXX_3.4.30' not found

baichuan2flm.py ..python3.10/site-packages/accelerate/big_modeling.py", line 415, in wrapper raise RuntimeError("You can't move a model that has some modules offloaded to cpu or disk.") RuntimeError: You can't move a model that has some modules offloaded to cpu or disk.

ptuning 训练后的chatGLM2 可以用fastllm吗

微调后的GhatGLM2-6B模型导出flm格式报错

Error: cublas error.terminate called after throwing an instance of 'char const*'

qwen模型回复缺字

为什么chatglm2-6b在P40,cuda 12.1的环境下fastllm加速后performance测试的速度非常低，只有8 tokens / s

← Metadata

Owner

Metadata

fastllm fastllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

fastllm
fastllm copied to clipboard