xbl916 comments

Results 16 comments of


                                            xbl916

[Feature Request]: Provide docker image for Mac with ARM

I tried to run it on Oracle's ARM host and rebuild docker image. I removed some packages which isn't supported on ARM platform such as Aspose.Slides and N-card. Therefore, Some...

Generic endpoint support for OpenAI API compatible providers

> > @ItzCrazyKns The problem can be resolved in 90% of cases by simply being compatible with the OpenAI API, opening the base_url, and allowing for a customizable model_name. This...

None of these LLM APIs providers for ollama LLMs or Azure OpenAI LLMs or embeddings models or any other LLMs don't work at all despite those changes.

你是想要通过本地兼容openai api方式运行嘛

[Bug] pt_main_thread uses 100% cpu all the time

> 这似乎是意料之中的事，因为服务器使用忙等待循环来等待新请求。这会给您带来严重麻烦吗？ Given that I still use an i5-8400, which only has six cores, and I’m running LLM with dual GPUs, two of those cores are often running at 100%,...

[Bug] pt_main_thread uses 100% cpu all the time

Adding time.sleep(0.001) around line 910 in the /opt/conda/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py file can significantly reduce CPU usage. Note that the sleep duration should not be too long, as it may affect the inference...

Any plan for Qwen2 14B and Qwen2-32B?

确实，很需要一个30b左右的模型

Any plan for Qwen2 14B and Qwen2-32B?

时刻关注，希望能有

[Deployment]:安装之后无法magic-server服务和magic-server-daemon无法启动

我也是这问题

FP8量化支持

vllm升级到0.5.3了还是一样

FP8量化支持

> @xbl916我们计划支持fp8。您当时加载xinference选择哪个格式？我是在xin上自定义注册的 { "version": 1, "context_length": 32768, "model_name": "qwen2-fp8", "model_lang": [ "en", "zh" ], "model_ability": [ "generate", "chat" ], "model_description": "This is a custom model description.", "model_family": "qwen2-instruct", "model_specs":...