xbl916

Results 16 comments of xbl916

I tried to run it on Oracle's ARM host and rebuild docker image. I removed some packages which isn't supported on ARM platform such as Aspose.Slides and N-card. Therefore, Some...

> > @ItzCrazyKns The problem can be resolved in 90% of cases by simply being compatible with the OpenAI API, opening the base_url, and allowing for a customizable model_name. This...

> 这似乎是意料之中的事,因为服务器使用忙等待循环来等待新请求。这会给您带来严重麻烦吗? Given that I still use an i5-8400, which only has six cores, and I’m running LLM with dual GPUs, two of those cores are often running at 100%,...

Adding time.sleep(0.001) around line 910 in the /opt/conda/lib/python3.10/site-packages/sglang/srt/managers/tp_worker.py file can significantly reduce CPU usage. Note that the sleep duration should not be too long, as it may affect the inference...

确实,很需要一个30b左右的模型

时刻关注,希望能有

vllm升级到0.5.3了还是一样

> @xbl916我们计划支持fp8。您当时加载xinference选择哪个格式? 我是在xin上自定义注册的 { "version": 1, "context_length": 32768, "model_name": "qwen2-fp8", "model_lang": [ "en", "zh" ], "model_ability": [ "generate", "chat" ], "model_description": "This is a custom model description.", "model_family": "qwen2-instruct", "model_specs":...