abelbabel
abelbabel
yeah, also saw this https://github.com/openai/whisper/discussions/264 Seems as if they do it with two runs: one for the spoken text, one for the speakers and then merging the results.
`-bo 7`, `-bo 10`, `-bo 15` and changing from `-O3` to `-O2` did not do the trick for me
> Is it possible to have the convert script support hugginface format like the one here https://huggingface.co/openai/whisper-medium/tree/main ? The usecase is to run fine tuned models with cpp. I don't...
> Personally, id be more than happy for whisper to just do speaker detection based on left & right channels on a stereo audio file. But I can achieve this...
> I've done some limited testing and was able to achieve reasonable split via `pyannote`. Bolting it all together is a different story though. @savchenko Could you give a small...
Sorry, this does not work for me. For example when piping `gb0.wav` (with small model) I get ``` system_info: n_threads = 4 / 8 | AVX = 1 | AVX2...
Does this work with continuous data from pipe for you too? At my site it seems to "wait" forever ... For example: `ffmpeg -loglevel -8 -i 'https://a.files.bbci.co.uk/media/live/manifesto/audio/simulcast/dash/nonuk/dash_low/cfs/bbc_world_service.mpd' -map_channel 0.0.0 -f...
Hi, I still want to emphasize the utility of a more general approach via pipe. Think of a inference-machine (with proper hardware) that should be used remotely by other processes...
In the result it may become what I was looking for, but I would consider this a workaround ... to me it seems kind of breaking how one expects unix-programs...