NextChat icon indicating copy to clipboard operation
NextChat copied to clipboard

[Feature Request] OpenAI Realtime API

Open lloydzhou opened this issue 1 year ago • 13 comments

🥰 需求描述

https://openai.com/index/introducing-the-realtime-api/

https://platform.openai.com/docs/api-reference/realtime

https://github.com/openai/openai-realtime-console/blob/main/readme/realtime-console-demo.png

image

🧐 解决方案

逻辑

  1. realtime api,使用websocket接入
  2. api本身内置了sessions, conversation等概念,session支持配置modalities, instructions, voice, input_audio_format, output_audio_format, turn_detection, input_audio_transcription, tools等,支持function call
  3. 支持input_audio_buffer.append以及input_audio_buffer.commit方式上传音频,再通过response.create开始生成结果(turn_detection如果开启,可以不用手动调用)
  4. 支持客户端发送conversation.item.create将上下文的内容直接添加到当前的conversation,如果是历史记录,需要设置status=completed
  5. conversation.item.truncate支持打断输入
  6. 通过监听事件response.audio.delta拿到base64 audio data,通过response.text.delta同步拿到文本。
  7. 通过监听事件response.output_item.added拿到是否是function call, 通过监听response.function_call_arguments.delta拿到function call参数。或者直接在response.done里面拿function call相关信息?

交互

  1. 可能会新增OpenAI客户端一样的语音交互页面直接调用realtime api。
  2. 当前的语音交互界面,默认全屏,支持缩小到输入框大小(替换输入框位置)。同时保留语音输入界面以及chat history页面(保留这里,可以支持展示插件执行生成的中间结果等,例如中间调用插件生成一张图,语音是无法直接描述的)。
  3. 语音通话生成的结果(audio buffer)以及同时拿到的文本信息,需要持久化到sessions里面
  4. 语音通话支持选择voice,format,detection模式,tools等(这些按钮需要保留,或者在语音界面重新布局)

讨论

  1. realtime是一个新的model,但是这个model明显和之前的model是不对等的。应该怎么放?
  2. realtime api也支持modalities只填写text,会将语音给屏蔽掉(只是屏蔽语音,但还是支持一整套的通过websocket调用这个模型)。

📝 补充信息

价格 image

lloydzhou avatar Oct 15 '24 06:10 lloydzhou

#5786

Dogtiti avatar Nov 07 '24 13:11 Dogtiti

Bot detected the issue body's language is not English, translate it automatically.


#5786

Issues-translate-bot avatar Nov 07 '24 13:11 Issues-translate-bot

设置面板配置参数 image

Dogtiti avatar Nov 11 '24 03:11 Dogtiti

暂时不支持添加context内容以及chat history https://community.openai.com/t/realtime-api-did-anybody-managed-to-provide-previous-conversation-transcript-history-while-keeping-audio-answers/968293

Dogtiti avatar Nov 11 '24 06:11 Dogtiti

请问这个有免费模型可用吗?还没聊1分钟就0.1$了。

kitaev-chen avatar Nov 11 '24 18:11 kitaev-chen

Bot detected the issue body's language is not English, translate it automatically.


Is there a free model available for this? It’s only 0.1$ after chatting for 1 minute.

Issues-translate-bot avatar Nov 11 '24 18:11 Issues-translate-bot

用了上述配置方式配置了自己的参数

无法启动realtime 麦克风一直为禁用状态,也无法启用

update:

  发现是因为azure的 deployment 前面多加了一个空格,改了以后在电脑上测试成功了。
  
  但是手机上还是没有成功, 抓包并未看到请求azure或者open的wss://协议 

dustookk avatar Nov 29 '24 09:11 dustookk

Bot detected the issue body's language is not English, translate it automatically.


image

Use the above configuration method to configure your own parameters.

Realtime cannot be started. The microphone is always disabled and cannot be enabled.

https://github.com/ChatGPTNextWeb/ChatGPT-Next-Web/issues/5825

Issues-translate-bot avatar Nov 29 '24 09:11 Issues-translate-bot

希望能自定义实时聊天的接口地址

qq1456680570 avatar Dec 27 '24 07:12 qq1456680570

Bot detected the issue body's language is not English, translate it automatically.


I hope to customize the interface address of real-time chat

Issues-translate-bot avatar Dec 27 '24 07:12 Issues-translate-bot

希望能自定义实时聊天的接口地址

是的,minimax也开放了realtime的接口,希望能够自定义接口地址,选择不同的realtime api服务:https://platform.minimaxi.com/document/Realtime?key=640e0c9c5f918b4f6c4e2d58

jayjayhust avatar Jan 27 '25 14:01 jayjayhust

Bot detected the issue body's language is not English, translate it automatically.


I hope to customize the interface address of real-time chat

Yes, minimax has also opened the realtime interface. I hope to be able to customize the interface address and choose different realtime api services: https://platform.minimaxi.com/document/Realtime?key=640e0c9c5f918b4f6c4e2d58

Issues-translate-bot avatar Jan 27 '25 14:01 Issues-translate-bot

hello only update image where is upload file

s3777091 avatar Apr 10 '25 09:04 s3777091