俞航 comments

Results 117 comments of


俞航

torch1.10.1，cuda11.3，推理时报错RuntimeError: CUDA error：no kernel image...是因为显存不够吗，3080显卡

requirement.txt中要求pytorch==1.13.1，这个py版本好像已经没有cuda11.3了，应该用pytorch-1.13.1cu117。同时，没有应用量化的话，fp16需要32G显存，3080肯定不够的运行量化版本需要编译gptq，需要安装cuda tool kit进行编译，和pytorch一个cuda版本

Add auto-gptq integration

@PanQiWei 装了auto-gptq，是不是量化就不用自己配置cuda环境，然后从gptq源码编译whl和pytorch extension？auto-gptq有要求对应的pytorch cuda版本？或transformer版本

NameError: name 'autotune' is not defined

参考docker文件中 https://github.com/linonetwo/MOSS-DockerFile/blob/master/moss-int4-cuda117.dockerfile ``` WORKDIR $CODEDIR ENV GIT_LFS_SKIP_SMUDGE=1 RUN git clone https://huggingface.co/fnlp/moss-moon-003-sft-plugin-int4 --filter=blob:none --depth=1 # fix name 'autotune' is not defined RUN mkdir -p /root/.cache/huggingface/modules/transformers_modules/local/ && cp $CODEDIR/moss-moon-003-sft-plugin-int4/custom_autotune.py /root/.cache/huggingface/modules/transformers_modules/local/ ```

我也是一样的问题， CUDA Version: 11.8

conda install cudatoolkit 自动安装cuda11.3运行时

vicuna模型是否能接入？

@glide-the 最终会选择接入某个llama模型到master吗。为了本地运行节省vram，要么用ggml，要么用gptq。这两个技术和glm，moss目前都不兼容。感觉国产llm在压缩技术上有点脱节，只有int4/int8量化。

Hardware requirements

GH200 datacenter rig which cost millions ;)

AI Tutor will forget it's prompt after 8k tokens

Does plugins get access to api responses like this : https://github.com/ysymyth/tree-of-thought-llm/blob/faa28c395e5b86bfcbf983355810d52f54fb7b51/models.py#L35, so that we can accurately count the number of tokens spent so far.

[Question] Usage with Multimodal LLM

I don't think vllm support image/audio embedding so far. Methods and abstraction on Multi-modal embedding needs to be supported first

libdwarf cannot find when using vcpkg, and ibdw has become a rerequisite for libdwarf?

I am not even sure where to to start debugging this problem, any kindly guide on where to start, thanks! Edit: ok, I added ``--debug-output ``` to cmake as here...