fastllm issues

chinese-llama-alpaca 模型 BUG

3

如题，会出现爆显存的问题，并打印如下错误： status = 7 2049 1 128 Error: cublas error. terminate called after throwing an instance of 'char const*' Aborted (core dumped)

levishen

Is it possible to support RWKV?

as you see , RWKV world transformer version is here, https://github.com/xiaol/Huggingface-RWKV-World/tree/main and some refs: https://osk6ppi3hr.feishu.cn/docx/ZlqkdTHD7owXpFxXvMyct0Jrn1d

xiaol

开始报错是nvcc fatal : Unsupported gpu architecture 'compute_89' 把CMakeList.txt的的set(CMAKE_CUDA_ARCHITECTURES "native") 改为了80, 同样的错误改成75后报下面的错误: [ 71%] Building CUDA object CMakeFiles/fastllm.dir/src/devices/cuda/fastllm-cuda.cu.o /usr/include/c++/11/bits/std_function.h:435:145: error: parameter packs not expanded with ‘...’: 435 | function(_Functor&& __f)...

seniornut

想问一下，会支持NTK-Aware Scaled RoPE 将context长度扩充到8k嘛

1

NTK-Aware Scaled RoPE 允许 LLaMA 模型具有扩展的 (8k+) 上下文大小，无需任何微调，并将困惑度降低降至最低。参考链接： https://www.reddit.com/user/bloc97 https://www.reddit.com/r/LocalLLaMA/comments/14lz7j5/ntkaware_scaled_rope_allows_llama_models_to_have/

pikaqqqqqq

add cuda for fp32

bjmsong

请问有支持starcoder模型的计划吗

Porraio

为什么转化后的model is not callable?

ubuntu 上的4卡3090跑 vicuna13B的模型，模型转化后不能调用？ “out = model( input_ids=torch.as_tensor( [[token] if not sent_interrupt else output_ids], device=device ), use_cache=True, past_key_values=past_key_values if not sent_interrupt else None, )” 在输入问题提交给模型后，这行代码报错：typeError: 'model' object is not callable

lzk9508

能否在Jetson NX 8G上运行

有没有人在英伟达的Jetson控制器上运行成功的？

weibo021

使用gunicorn开多线程报 Error: cublas error.

2

我用gunicorn+flask搭了一个简单的多线程推理服务 `gunicorn --threads 10 "client_glm26b:APP" -b "0.0.0.0:19002" -w 1 --preload` 调用chat报错 `Error: cublas error.` 单独起flask功能正常

wqh17101

-DPY_API=ON 的作用是什么

我看文档中： ![image](https://github.com/ztxz16/fastllm/assets/26429138/0b34f965-eacf-4d15-b596-0945bd648a0a) 这里并不需要 `-DPY_API=ON`，默认是OFF 也可以正常使用。请问 -DPY_API=ON 的作用是什么

wqh17101

fastllm
fastllm copied to clipboard

Metadata

chinese-llama-alpaca 模型 BUG

Is it possible to support RWKV?

也报了 make -j 错误

想问一下，会支持NTK-Aware Scaled RoPE 将context长度扩充到8k嘛

add cuda for fp32

请问有支持starcoder模型的计划吗

为什么转化后的model is not callable?

能否在Jetson NX 8G上运行

使用gunicorn开多线程报 Error: cublas error.

-DPY_API=ON 的作用是什么

← Metadata

Owner

Metadata

fastllm fastllm copied to clipboard

Metadata

← Metadata

Owner

Metadata

fastllm
fastllm copied to clipboard