俞航

Results 117 comments of 俞航

plugin开源库还没完善

requirement.txt中要求pytorch==1.13.1,这个py版本好像已经没有cuda11.3了,应该用pytorch-1.13.1cu117。 同时,没有应用量化的话,fp16需要32G显存,3080肯定不够的 运行量化版本需要编译gptq,需要安装cuda tool kit进行编译,和pytorch一个cuda版本

@PanQiWei 装了auto-gptq,是不是量化就不用自己配置cuda环境,然后从gptq源码编译whl和pytorch extension?auto-gptq有要求对应的pytorch cuda版本?或transformer版本

参考docker文件中 https://github.com/linonetwo/MOSS-DockerFile/blob/master/moss-int4-cuda117.dockerfile ``` WORKDIR $CODEDIR ENV GIT_LFS_SKIP_SMUDGE=1 RUN git clone https://huggingface.co/fnlp/moss-moon-003-sft-plugin-int4 --filter=blob:none --depth=1 # fix name 'autotune' is not defined RUN mkdir -p /root/.cache/huggingface/modules/transformers_modules/local/ && cp $CODEDIR/moss-moon-003-sft-plugin-int4/custom_autotune.py /root/.cache/huggingface/modules/transformers_modules/local/ ```

conda install cudatoolkit 自动安装cuda11.3运行时

@glide-the 最终会选择接入某个llama模型到master吗。为了本地运行节省vram,要么用ggml,要么用gptq。这两个技术和glm,moss目前都不兼容。感觉国产llm在压缩技术上有点脱节,只有int4/int8量化。

GH200 datacenter rig which cost millions ;)

Does plugins get access to api responses like this : https://github.com/ysymyth/tree-of-thought-llm/blob/faa28c395e5b86bfcbf983355810d52f54fb7b51/models.py#L35, so that we can accurately count the number of tokens spent so far.

I don't think vllm support image/audio embedding so far. Methods and abstraction on Multi-modal embedding needs to be supported first

I am not even sure where to to start debugging this problem, any kindly guide on where to start, thanks! Edit: ok, I added ``--debug-output ``` to cmake as here...