whisper.cpp icon indicating copy to clipboard operation
whisper.cpp copied to clipboard

Limit on audio file duration

Open adeelabbas opened this issue 1 year ago • 4 comments

Hi - awesome work! I am wondering if there is a size or duration limit to the size of file that can be processed using this library?

adeelabbas avatar Mar 24 '23 07:03 adeelabbas

There's the --duration (or -d) switch. It doesn't eat bytes but milliseconds though, so if you need to use the file size it will be necessary to do some calculation first (with WAV it shouldn't be too hard). I guess the other option is to use some external tool to clip the file as you want it.

misutoneko avatar Mar 25 '23 23:03 misutoneko

Hi!

I'm also wondering this - I keep running into segfaults.

❯ ./main -m models/ggml-medium.en.bin -f /users/Jonathan/Desktop/biology_1.wav -pc -d 110000 --offset-t 0 -t 8 -of /users/Jonathan/Desktop/biology_1-1 -otxt
whisper_init_from_file_no_state: loading model from 'models/ggml-medium.en.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51864
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 1024
whisper_model_load: n_audio_head  = 16
whisper_model_load: n_audio_layer = 24
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 1024
whisper_model_load: n_text_head   = 16
whisper_model_load: n_text_layer  = 24
whisper_model_load: n_mels        = 80
whisper_model_load: f16           = 1
whisper_model_load: type          = 4
whisper_model_load: mem required  = 1720.00 MB (+   43.00 MB per decoder)
whisper_model_load: adding 1607 extra tokens
whisper_model_load: model ctx     = 1462.35 MB
whisper_model_load: model size    = 1462.12 MB
whisper_init_state: kv self size  =   42.00 MB
whisper_init_state: kv cross size =  140.62 MB

system_info: n_threads = 8 / 10 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 1 | WASM_SIMD = 0 | BLAS = 1 | SSE3 = 0 | VSX = 0 | 

main: processing '/users/Jonathan/Desktop/biology_1.wav' (65200124 samples, 4075.0 sec), 8 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:06.000]   measuring all of these methylation marks that sit on top of their DNA.
[00:00:06.000 --> 00:00:14.000]   That's cool, you can literally go out, send up the samples, get your biological information through that method.
[00:00:14.000 --> 00:00:19.000]   Cool. So I digress. And now this study of internal structure.
[00:00:19.000 --> 00:00:29.000]   Physiology, internal structure, pathology, and embryology become it, and virology taxonomy.
[00:00:29.000 --> 00:00:34.000]   Study of classification and its part of biopharmaceuticals.
[00:00:34.000 --> 00:00:38.000]   Panatology, study of ancient life.
[00:00:38.000 --> 00:00:42.000]   Long-term biology, study of biological molecules.
[00:00:42.000 --> 00:00:45.000]   Physiology, study of tissues.
[00:00:45.000 --> 00:00:47.000]   And there are many more.
[00:00:47.000 --> 00:00:54.000]   So when you say you want to study biology and beauty, it doesn't mean a whole lot, right?
[00:00:54.000 --> 00:00:59.000]   Which one of these fields do you want to study?
[00:00:59.000 --> 00:01:05.000]   So, what makes something alive?
[00:01:05.000 --> 00:01:12.000]   What makes this lion alive and what makes this monkey not alive?
[00:01:12.000 --> 00:01:14.000]   The cell structure.
[00:01:14.000 --> 00:01:17.000]   Ooh, a heart. Does something need a heart for a bit of life?
[00:01:17.000 --> 00:01:19.000]   No.
ggml_new_tensor_impl: not enough space in the scratch memory
[1]    12274 segmentation fault  ./main -m models/ggml-medium.en.bin -f /users/Jonathan/Desktop/biology_1.wav 

This line ggml_new_tensor_impl: not enough space in the scratch memory - I'm looking through the code and wondering if there's a hard limit on how much we can process at a time?

Just noting that this is 110sec: ./main -m models/ggml-medium.en.bin -f /users/Jonathan/Desktop/biology_1.wav -pc -d 110000 --offset-t 0 -t 8 -of /users/Jonathan/Desktop/biology_1-1 -otxt

razodactyl avatar Mar 27 '23 15:03 razodactyl

I started to start receive the same error from this week, i was trying to test some configurations before moving the options to the nodejs addon im using, but now is starting to recieve the same error when i use the plain whisper cpp script

ggml_new_tensor_impl: not enough space in the scratch memory zsh: segmentation fault ./main -m models/ggml-medium.bin -p 1 -f samples/jfk2.wav

The interesting part is, i can process the exact same audio with the nodejs addon from my server, but from the main whisper cpp from shell i receive the error.

LucasZNK avatar Mar 27 '23 20:03 LucasZNK

This https://github.com/ggerganov/whisper.cpp/commit/0be9cd34979d9c989330eda80dfe9e7086b694d4 fixed the issue.

WaKeMaTTa avatar Mar 28 '23 20:03 WaKeMaTTa