inference Whisper模型支持更多的响应格式

Is your feature request related to a problem? Please describe

目前，whisper部署上去后， xinference的api仅支持response_format=json. 这极大限制了whisper的使用场景，例如字幕提取(需要时间戳)。

Describe the solution you'd like

放开对这个响应格式的限制。

Additional context

openAI官方的接口文档

Apr 26 '24 09:04 jianchaozhuang

这个模型本身有能力返回其他类型吗？

Apr 26 '24 09:04 qinxuye

我看了下模型文档，至少 https://huggingface.co/openai/whisper-large-v3 应该是可以返回时间戳的。OpenAI 的 api 定义这个transcriptions接口是可以返回时间戳信息的。不过格式比较复杂。我看一下这个问题。

Apr 26 '24 10:04 codingl2k1

https://platform.openai.com/docs/api-reference/audio/verbose-json-object

Apr 26 '24 10:04 codingl2k1

默认且只支持json响应格式，里面就一个text字段。要支持时间戳，起码是verbose_json格式才行。

Apr 26 '24 10:04 jianchaozhuang

我看了下模型文档，至少 https://huggingface.co/openai/whisper-large-v3 应该是可以返回时间戳的。OpenAI 的 api 定义这个transcriptions接口是可以返回时间戳信息的。不过格式比较复杂。我看一下这个问题。

这个问题啥时候能支持呢，谢谢

Apr 28 '24 09:04 jianchaozhuang

我是用 github main 最新的分支部署后还是不可用，我试着升级了 whisper-large-v3 到最新的模型版本（目前的版本还落后两个），之后就正常能用了。

错误信息：Whisper did not predict an ending timestamp, which can happen if audio is cut off in the middle of a word. Also make sure WhisperTimeStampLogitsProcessor was used during generation.

May 08 '24 09:05 coswind

我是用 github main 最新的分支部署后还是不可用，我试着升级了 whisper-large-v3 到最新的模型版本（目前的版本还落后两个），之后就正常能用了。

错误信息：Whisper did not predict an ending timestamp, which can happen if audio is cut off in the middle of a word. Also make sure WhisperTimeStampLogitsProcessor was used during generation.

离v0.11.0还太久了。。。如何源码部署呢？另外，如何升级模型，是不是把原来的删掉，重新下载就可以了？

May 09 '24 02:05 jianchaozhuang

我是用 github main 最新的分支部署后还是不可用，我试着升级了 whisper-large-v3 到最新的模型版本（目前的版本还落后两个），之后就正常能用了。错误信息：Whisper did not predict an ending timestamp, which can happen if audio is cut off in the middle of a word. Also make sure WhisperTimeStampLogitsProcessor was used during generation.

离v0.11.0还太久了。。。如何源码部署呢？另外，如何升级模型，是不是把原来的删掉，重新下载就可以了？

pip install git+https://github.com/xorbitsai/inference.git 安装最新的版本。

模型你可以看一下版本号，目前的版本号是 6cdf07a7e，请参考模型版本

https://huggingface.co/openai/whisper-large-v3/commits/main

可以看一下新的 2 个 commit，你只需要修改模型的 generation_config.json 中的 forced_decoder_ids 部分：


"forced_decoder_ids": [
    [
      1,
      null
    ],
    [
      2,
      50360
    ]
  ]

具体的模型文件位置在：${XINFERENCE_HOME}/cache/whisper-large-v3

May 09 '24 06:05 coswind

pip install git+https://github.com/xorbitsai/inference.git

用这种方式升级后，确实可以了。但悲催的是xinference的web ui打不开了，直接报404，只能在命令行下部署模型。大佬你有这个现象吗？ @coswind

May 09 '24 16:05 jianchaozhuang

pip install git+https://github.com/xorbitsai/inference.git

用这种方式升级后，确实可以了。但悲催的是xinference的web ui打不开了，直接报404，只能在命令行下部署模型。大佬你有这个现象吗？ @coswind

是的，这种方式安装 web 没有打包构建，你不嫌麻烦可以先 git clone，然后在 xinference/web/ui 目录下面，npm install & npm run build，最后再回到根目录 python setup.py install 安装。（我没有试过，但看代码应该是如此）

May 10 '24 01:05 coswind