lightllm
lightllm copied to clipboard
LightLLM is a Python-based LLM (Large Language Model) inference and serving framework, notable for its lightweight design, easy scalability, and high-speed performance.
在 `HttpServerManager` 类中,当服务配置为多节点模式 (`args.nnodes > 1`) 且节点 rank 大于 0 时,代码使用 `zmq.PULL` socket 绑定到 `tcp://*:{args.multinode_httpmanager_port}`。 这意味着该端口监听来自**所有网络接口**的连接,潜在地将其暴露给不受信任的网络。 [lightllm/server/httpserver/manager.py](https://github.com/ModelTC/lightllm/blob/6234bd3bdf2c8876f953d833db71e4b0c7192a52/lightllm/server/httpserver/manager.py#L626) ```python # 在 HttpServerManager.__init__ 中: if args.nnodes > 1: if args.node_rank == 0:...
Hello! We are SecurityReportBot, an automated security assistant. During our routine scan, we detected a potential vulnerability in your repository. However, we noticed that GitHub Security Reports are not enabled...
Something Like GGUF 1bit and 2bit quantization?
I saw your code referring to PD disaggragate. Please tell me how to use it
There are some papers showing perspectives as below. " **_However, the following two issues can lead to low GPU utilization. First, the Decode stage of GPT requires frequent sequential computing...
i strictly follow the installation docs (https://lightllm-cn.readthedocs.io/en/latest/getting_started/installation.html#installation). and my gpu is a800. error: python -m lightllm.server.api_server --model_dir ~/autodl-pub/models/llama-7b/ INFO 12-24 20:14:05 [cache_tensor_manager.py:17] USE_GPU_TENSOR_CACHE is On ERROR 12-24 20:14:05 [_custom_ops.py:51] vllm...