bonuschild issues

Results 14 issues of


                                            bonuschild

Load AWQ quantization model OOM !!!

example: - https://huggingface.co/TheBloke/CodeLlama-7B-AWQ: physical size is 4GB but use VRAM about 20GB - https://huggingface.co/TheBloke/deepseek-coder-33B-instruct-AWQ: physical size is 17GB but can not run on a dual-A100(40G) server. > `--tensor-parallel-size 2` is...

How to use with VS Code?

I thinks we need: - VS Code plugins - Model loading methods - API Server to communicate between plugins and model backend Is here we have mature solution?

配置文件格式在哪里学习？

首先是仓库有两个：https://github.com/ginuerzh/gost/ 和 https://github.com/go-gost/gost - 后者里的文档关于配置是YAML和JSON - 前者仓库的代码和issue中发现只能用JSON，且格式较大不一致，学习比较混乱。请问作者能否指条道路，本仓库的`gost`要以配置文件方式启动的话，要怎么写呢？以下是使用错误的配置文件的报错： ![image](https://github.com/ginuerzh/gost/assets/135304748/f61cb9ff-9b9e-43e4-b4d8-ed56e452721a)

[Feature]: 如何使用自己的ChatGLM2模型？

### Class | 类型大语言模型 ### Feature Request | 功能请求看文档介绍，似乎是支持自己Pull模型下载在本地run，可否直接给出在另一个IP上的api的http地址供gpt-academic使用？也可能是我没看全文档，请指教，谢谢！

cause of issue is unknown

Feature request: create new diagram in same path of other attachments

### req1: svg storage location - In obsidian, there is an option that setting the location of new created attachment such as picture and so on: ![image](https://github.com/zapthedingbat/drawio-obsidian/assets/135304748/831202a2-a9a1-41b6-84de-0cd9bcfa3d9f) ### req2: svg...

Feature request: set svg format same as draw.io when insert new diagram

In draw.io a new created diagram is set "grid view" on and "page view on": ![image](https://github.com/zapthedingbat/drawio-obsidian/assets/135304748/871889c5-f817-4a5b-8002-a2b7cd28af95) I use draw.io open the `.svg` file created by plugin `drawio-obsidian` it show no...

Is this says codellama run only on CPU with this project?

I saw that you mock `llama.cpp` but I still have gpu resources, event I also have enough cpu & RAM. Just want to figure out it's right scene to deploying.

missing requirements.txt

As title this repository missing an official `requirements.txt` to guide developer to install dependencies. Will it come up later?

Running out of memory with TheBloke/CodeLlama-7B-AWQ

- Already posted on https://github.com/vllm-project/vllm/issues/1479 - My GPU is RTX 3060 with 12GB VRAM - My target model is[CodeLlama-7B-AWQ](https://huggingface.co/TheBloke/CodeLlama-7B-AWQ), which size is

Can not load local model with Auto-GPTQ

Here is my output after executing: ```bash (autogptq) root@XXX:/mnt/e/Downloads/AutoGPTQ-API# python blocking_api.py Traceback (most recent call last): File "/mnt/e/Downloads/AutoGPTQ-API/blocking_api.py", line 29, in model = AutoGPTQForCausalLM.from_quantized(model_name_or_path, File "/root/miniconda3/envs/autogptq/lib/python3.10/site-packages/auto_gptq/modeling/auto.py", line 108, in from_quantized...