Lzhang-hub comments

Results 18 comments of


                                            Lzhang-hub

A single GPU card may be oversold during scheduling

> I read metric from ip:5678, through `container_gpu_memory_total` can get all container name on each GPU , then sum `container_request_gpu_memory ` on signal GPU can get the allocated value on...

A single GPU card may be oversold during scheduling

> Kubelet doesn't allow oversell device resource, how was your calculation? I think kubelet doesn't allow oversell device resource is for k8s node, but in my case, oversell is happen...

A single GPU card may be oversold during scheduling

> I read metric from ip:5678, through `container_gpu_memory_total` can get all container name on each GPU , then sum `container_request_gpu_memory ` on signal GPU can get the allocated value on...

请教一下，并行计算动态batch这一块在哪里有实现，python有调用的示例吗？

> 这个是底层特性，不用显示调用的 > > 你就用多线程调用stream_reponse或者stream_chat就可以了，底下会自动拼batch的 > > 目前仅对float16有提升，线程数不多的情况下每路的延迟应该和单路差不多 @ztxz16 我测试了一下这个，10个prompt用10个线程同时发请求，返回的结果看起来是多个结果混合了，看了一下web_api的代码，似乎对于多线程的请求，似乎是存储在同一个queue中导致的？

请教一下，并行计算动态batch这一块在哪里有实现，python有调用的示例吗？

> 是的，这点确实需要改进，web_api的"/api/chat_stream"接口在多线程请求时确实有这个问题。"/api/batch_chat"接口因为不是流式传输，没有这个问题。 @kiranosora 有啥可以改进的方案嘛？求一个

请教一下，并行计算动态batch这一块在哪里有实现，python有调用的示例吗？

> @Lzhang-hub 新版本已修复此问题棒！👍🏻

请教一下，并行计算动态batch这一块在哪里有实现，python有调用的示例吗？

> 预计通过handle_id区分不同的会话。非流式对话的话，可以先用“api/batch_chat"。 @q497629642 你好，最近又batch推理的需求，我测试了一下apt/batch_chat的接口，发现请求的耗时和prompt的list长度呈线性关系，似乎没有batch的效果，请问是还有额外的参数嘛？

could end_to_end_test.py with model_name 'ensemble' support decoupled mode

> Yes. @byshiue Hi, I deploy a bloom model with `decoupled=True`, and run end_to_end_test.py with grpc. I got the error: ``` [StatusCode.UNIMPLEMENTED] ModelInfer RPC doesn't support models with decoupled transaction...

could end_to_end_test.py with model_name 'ensemble' support decoupled mode

Thank you very much for your reply, I try `identity_test.py` with decoupled mode and print the output in completion_callback,streaming output is fine ```Python # Callback function used for async_stream_infer() def...

could end_to_end_test.py with model_name 'ensemble' support decoupled mode

Thank you very much, I run it successfully! I made a stupid mistake ,I forgot to change the name of the output variable I use `print(f"result: {result.as_numpy('output_ids')}")` in ensemble model...