wangyongpenga issues

Results 8 issues of


                                            wangyongpenga

CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at

### Describe the bug 2024-05-07 14:43:59,173 xinference.api.restful_api 836 ERROR [address=0.0.0.0:45545, pid=2324] CUDA call failed lazily at initialization with error: device >= 0 && device < num_gpus INTERNAL ASSERT FAILED at...

gpu

安装xinference卡住不动

卡在这1小时了

DeepSpeed-MII 能加载量化的int4或者int8的模型吗？

建议chatbox增加max_tokens的选项，一些自定义的模型需要此参数，不然长的回答会截断

调用chat接口时不指定max_tokens参数默认返回的tokens长度为1024，怎么发布模型时修改默认配置增加返回tokens长度

Perplexica’s supports Configurable private deployment model

Perplexica’s supports Configurable private deployment model by OpenAI-compatible interface

建议支持 DeepSeek-R1

### Feature request / 功能建议建议支持 DeepSeek-R1 ### Motivation / 动机 DeepSeek-R1 蒸馏小模型 ### Your contribution / 您的贡献 DeepSeek-R1 https://github.com/deepseek-ai/DeepSeek-R1

feature

connection timed out

File "/usr/local/lib/python3.10/site-packages/urllib3/connectionpool.py", line 534, in _make_request response = conn.getresponse() | -> -> File "/usr/local/lib/python3.10/site-packages/urllib3/connection.py", line 516, in getresponse httplib_response = super().getresponse() File "/usr/local/lib/python3.10/http/client.py", line 1375, in getresponse response.begin() | ->...