Streaming output support?
Wondering if streaming output is supported? Or are there any results about the time to first token and time per output token? Thanks.
Copying the code of def stream_chat(self, ....) to modeling_internvl_chat.py from modeling_internlm2.py https://huggingface.co/OpenGVLab/InternVL-Chat-V1-5/blob/main/modeling_internlm2.py and make very small changes , I implement it and verify that is usefull.
@NiYueLiuFeng Can you share your modeling_internvl_chat.py? Thank you very much
Hi, see this guide for streaming output: https://internvl.readthedocs.io/en/latest/internvl2.0/quick_start.html#streaming-output