ms-swift 流式输出卡顿（NPU, 昇腾910B）

Describe the bug What the bug is, and how to reproduce, better with screenshots(描述bug以及复现过程，最好有截图) 使用python requests.post部署好的接口服务进行流式返回时，返回内容会一次返回一大段，然后等待（大概2~3秒）再次返回一大段。

Your hardware and system info Write your system info like CUDA version/system/GPU/torch version here(在这里给出硬件信息和系统信息，如CUDA版本，系统，GPU型号和torch版本等) 昇腾910B+torch2.3.1+Ascend8.0.rc1

Additional context Add any other context about the problem here(在这里补充其他信息)

for chunk in response.iter_content(decode_unicode=True):
    print(chunk)
    # 当打印完一段后，这里会有两秒的卡顿，然后继续输出

Sep 24 '24 06:09 liujiachang

https://swift.readthedocs.io/zh-cn/latest/LLM/VLLM%E6%8E%A8%E7%90%86%E5%8A%A0%E9%80%9F%E4%B8%8E%E9%83%A8%E7%BD%B2.html

Sep 24 '24 06:09 Jintao-Huang

文档中没有搜索到request.post的方法，有curl和openai的样例。而且我用openai的方式进行请求，流式输出仍然是卡顿的，和request的时候一致，一段一段输出。是否是有某些参数导致存在缓存。

Sep 24 '24 07:09 liujiachang

测试出来了，不是代码问题，是部署的平台还有一层代理转发机制，此问题可关闭。

Sep 24 '24 07:09 liujiachang

是否有方法可以修改流模式的分隔符，比如将\n改为\r\n。

Sep 24 '24 08:09 liujiachang