inference FEAT: Audio support verbose

Fix temperature not set.
Fixes: https://github.com/xorbitsai/inference/issues/1387
Fixes: https://github.com/xorbitsai/inference/issues/1253

Apr 28 '24 19:04 codingl2k1

似乎translate跟transcript都能接收prompt参数，也都可以输出时间戳，他们之间的差别就只有一个task参数不一样，然后会忽略部分参数(例如translate会忽略language)。现在xinference这个实现， translate还是不会输出时间戳，也不接受prompt，同时，上述两个api都不支持temperature参数。看看能否在这个feature中一起处理了呢？

Apr 29 '24 02:04 jianchaozhuang

我试了下，可以扩展 translations API，兼容 OpenAI API。OpenAI API 的 translations是没有 language 和timestamp_granularities参数的：https://platform.openai.com/docs/api-reference/audio/createTranscription

Apr 29 '24 08:04 codingl2k1

我试了下，可以扩展 translations API，兼容 OpenAI API。OpenAI API 的 translations是没有 language 和timestamp_granularities参数的：https://platform.openai.com/docs/api-reference/audio/createTranscription

https://huggingface.co/openai/whisper-large-v3 从这个页面看，原始的whisper模型， transcriptions跟translations，这两个功能就是task参数不一样，两个功能都支持返回时间戳(最起码没有说不支持)。建议是不是把language跟timestamp_granularities都冗余进去？模型顶多就是忽略，也不会报错。

另外，好像你没传递prompt这个参数。目前的代码，在prompt上有两个问题:

输出的warning信息是错的，例如translations方法里面，warning输出是transcription。
不应该输出warning，而是把prompt参数传递到模型里去。当然，能对这个参数做一下校验就更好了(要求用[]包起来)

Apr 29 '24 12:04 jianchaozhuang

已经这么做了。现在这个 PR 是兼容 OpenAI Python client 的，也可以通过 post请求或者 xinference client 传递额外参数给 translations接口。

Apr 29 '24 13:04 codingl2k1

你这个PR没有把prompt参数传递给模型。

Apr 29 '24 15:04 jianchaozhuang

你这个PR没有把prompt参数传递给模型。

还不清楚咋传过去。目前这个 model 是个 https://huggingface.co/docs/transformers/v4.40.1/en/main_classes/pipelines#transformers.AutomaticSpeechRecognitionPipeline 不论是 pipeline 还是generate_kwargs 都没有 prompt 相关的参数。

Apr 29 '24 18:04 codingl2k1

有个 issue 是关于 whisper pipeline 传 prompt 的：https://github.com/huggingface/transformers/issues/27317 关联的 PR 还没合并：https://github.com/huggingface/transformers/pull/28556

Apr 30 '24 19:04 codingl2k1

有个 issue 是关于 whisper pipeline 传 prompt 的：huggingface/transformers#27317 关联的 PR 还没合并：huggingface/transformers#28556

@jianchaozhuang 他这个 PR 看着快进去。transformers 支持传 prompt，我们第一时间支持上。这个 PR 就先这样。

Apr 30 '24 19:04 codingl2k1

有个 issue 是关于 whisper pipeline 传 prompt 的：huggingface/transformers#27317 关联的 PR 还没合并：huggingface/transformers#28556

@jianchaozhuang 他这个 PR 看着快进去。transformers 支持传 prompt，我们第一时间支持上。这个 PR 就先这样。

@codingl2k1 老板能支持其他格式吗？起码对字幕srt格式的支持，这个还是挺有用的，而且模型本身就支持这种格式

May 10 '24 06:05 jianchaozhuang

FEAT: Audio support verbose_json and timestamp