whisper.cpp
whisper.cpp copied to clipboard
What about this czech wav-file
this czech wav file seems to be hard to transcribe for Whisper At the start there is music and then speech. But the srt is full of trash.
https://drive.google.com/file/d/11npsQDnMiAV_Y-OsgHtLwfqxQaZriKMI/view?usp=share_link
Any chance to get a right srt from it? Do you think whisper will do that properly one day? I used it with large model and Subtitle Edit.
The first output i received was "main: WAV file 'E:\documents\Arabela 2 fehlende Szene.wav' must be 16 kHz" Im using version 1.1.0+ and base model.
After converting the wav to 16khz i then ran (Im guessing that whisper auto detected the language and translated to english.)
main -m \documents\whisper-bin-x64\models\ggml-base.bin -f "E:\documents\ Arabela.wav" -t 1 -osrt
[00:00:00.000 --> 00:00:10.000] [MUSIC] [00:00:10.000 --> 00:00:12.580] (gentle music) [00:00:39.160 --> 00:00:46.720] When a person is happy, happy as a crying person, Peter was not allowed to be a child, and the first big animal was born. [00:00:46.720 --> 00:00:53.920] When the baby was changed, the baby and the baby were in high school and the baby had a baby. [00:00:53.920 --> 00:00:57.720] Peter was a successful child, but he was not allowed to be a child. [00:00:57.720 --> 00:01:07.760] Imagine how when he was not a child, when the baby was afraid of the child and suddenly he was afraid of the child, that he would use it and the child would be afraid of the child. [00:01:08.080 --> 00:01:10.880] You're a popular person, so you can go to TV. [00:01:10.880 --> 00:01:15.480] The day has always been a fun day, so I'm sure you'll enjoy it. [00:01:15.480 --> 00:01:18.080] - Bye! - Bye! [00:01:18.080 --> 00:01:21.080] - Bye! - Bye! [00:01:21.080 --> 00:01:25.080] - What's wrong with you? - I'm done with the work. [00:01:25.080 --> 00:01:28.080] You're going to the hospital. [00:01:28.080 --> 00:01:34.080] - Well, I'm going to the hospital. - You're not going to the hospital. [00:01:34.080 --> 00:01:36.660] (soft music) [00:01:36.660 --> 00:01:39.240] (soft music) [00:01:39.240 --> 00:01:41.820] (gentle music) [00:01:41.820 --> 00:01:44.400] (gentle music) [00:01:44.400 --> 00:01:51.920] (speaking in foreign language) [00:01:51.920 --> 00:01:52.760] (sighs) [00:01:52.760 --> 00:01:56.100] (speaking in foreign language) [00:01:56.100 --> 00:01:58.680] (gentle music) [00:02:11.280 --> 00:02:13.280] I'm the one who's going to tell you everything. [00:02:13.280 --> 00:02:19.280] You're the one who's not going to listen to me. [00:02:19.280 --> 00:02:25.280] I'm going to take a shower. [00:02:25.280 --> 00:02:27.280] I'm going to get a shower. [00:02:27.280 --> 00:02:29.280] [Spanish]
The next run ended up as a trancription . Note the "-l cs" option.
main -m \documents\whisper-bin-x64\models\ggml-base.bin -f "E:\documents\Arabela.wav" -t 1 -l cs -osrt
00:00:00,000 --> 00:00:28,000 [MUSIK] 00:00:28,000 --> 00:00:38,000 [MUSIK] 00:00:38,000 --> 00:00:42,000 Když je člověk šťastní časleti jako splášený, 00:00:42,000 --> 00:00:47,000 Petr s Arabelaus se a ně ne nadály a první velké výročí je zde. 00:00:47,000 --> 00:00:52,000 Když se za tu dobů změnilo, zvon děká a maženky se stali v sokoškoláci 00:00:52,000 --> 00:00:54,000 a pání majerova oho dověla. 00:00:55,000 --> 00:00:57,600 všichni minále s sem ale ani a dáme na nezahali. 00:00:57,600 --> 00:01:01,600 Spomináte si, jak když si tajně s nědlákov se k masso zbíle hodá 00:01:01,600 --> 00:01:03,200 a porozumněla řeči zbířat? 00:01:03,200 --> 00:01:07,600 Teď toho vyžívá a překláda chováte u zbířat rozhovori s jejich mělačky. 00:01:07,600 --> 00:01:10,600 Stala se populární a bystupuje do konce v TV. 00:01:10,600 --> 00:01:15,400 Dnes si však udělala volno tak významně výročí se přece musího slavit. 00:01:15,400 --> 00:01:17,800 - Většen! - Ano! 00:01:17,800 --> 00:01:22,400 - Ahoj! - Cytak brzy? 00:01:23,000 --> 00:01:28,000 A nezkončerizom nezkada když ty grundujěš, zrovná nezkou. 00:01:28,000 --> 00:01:30,000 No, příc. 00:01:30,000 --> 00:01:32,000 No, co je s to? 00:01:32,000 --> 00:01:34,000 Inže nevíš pro zrovná nezkane. 00:01:34,000 --> 00:01:49,000 A ryberou. 00:01:52,000 --> 00:01:54,000 Pytři. 00:01:54,000 --> 00:01:57,000 Pro míň já byslý. 00:01:57,000 --> 00:02:14,000 To jsem teda moc zvědavá, kdo toho lepšichno svěl. 00:02:14,000 --> 00:02:20,000 Ty se zbažně byl sleče pích začeněla. 00:02:20,000 --> 00:02:22,000 [Kříčí] 00:02:22,000 --> 00:02:25,880 Svoje sem váno řek! 00:02:25,880 --> 00:02:27,800 Boč kterou si ál tomu! 00:02:27,800 --> 00:02:29,300 A co pojď porě si a nejpo...
The next run came out a bit different than the first "auto" translation. I don't know why. Note the options "-l cs" and "-tr Arabela.wav"
main -m \documents\whisper-bin-x64\models\ggml-base.bin -tr "E:\documents\Arabela.wav" -t 1 -l cs -osrt
[00:00:00.000 --> 00:00:29.800] [Music] [00:00:29.800 --> 00:00:38.800] [Music] [00:00:38.800 --> 00:00:42.800] When a person is happy, happy, happy as a burden, [00:00:42.800 --> 00:00:46.800] Peter was not able to give up, and first big hands are everywhere.
[00:00:46.800 --> 00:00:50.800] When he changed his mind, he changed his mind and his wife, [00:00:50.800 --> 00:00:53.800] and he had a major in high school. [00:00:53.800 --> 00:00:57.800] Peter was successful, but he was not able to give up. [00:00:57.800 --> 00:01:07.800] Remember when I was a kid and I was going to take care of him, you would take care of him and take care of him and talk to him. [00:01:07.800 --> 00:01:10.800] It became popular and it would be a success in television. [00:01:10.800 --> 00:01:15.800] I was always doing it right so I was not sure if I would have to d o it. [00:01:15.800 --> 00:01:18.800] - What's up? - Hello. [00:01:18.800 --> 00:01:20.800] Hello. [00:01:20.800 --> 00:01:24.800] - What's up, Brzy? - I'm here to talk to you. [00:01:24.800 --> 00:01:26.800] You're going to the hospital? [00:01:26.800 --> 00:01:28.800] -I'm going to the hospital. -I'm going to the hospital. [00:01:28.800 --> 00:01:30.800] -I'm going to the hospital. -I'm going to the hospital. [00:01:30.800 --> 00:01:32.800] -I'm going to the hospital. -I'm going to the hospital. [00:01:32.800 --> 00:01:34.800] -I'm going to the hospital. -I'm going to the hospital. [00:01:34.800 --> 00:01:49.800] -I'm going to the hospital. [00:01:49.800 --> 00:01:53.800] -I'm going to the hospital. [00:01:53.800 --> 00:01:55.800] I'll try to make it. [00:01:55.800 --> 00:02:13.800] I'm very proud of this, but I like all the things. [00:02:13.800 --> 00:02:18.800] You're the most important thing to me. [00:02:18.800 --> 00:02:20.800] I'll be right back. [00:02:20.800 --> 00:02:22.800] I'll take you to the hospital. [00:02:22.800 --> 00:02:24.800] I'll take you to the hospital. [00:02:24.800 --> 00:02:26.800] I'll take you to the hospital. [00:02:26.800 --> 00:02:28.800] I'll take you to the hospital.
Hope it helps. Explanations for the difference between the 2 translations would be appreciated.
WIth model large an 1.2.0 I've got this srt
1
00:00:00,000 --> 00:00:02,000
(hudba)
2
00:00:02,000 --> 00:00:04,000
(hudba)
3
00:00:04,000 --> 00:00:06,000
(hudba)
4
00:00:06,000 --> 00:00:08,000
(hudba)
5
00:00:08,000 --> 00:00:10,000
(hudba)
6
00:00:10,000 --> 00:00:12,000
(hudba)
7
00:00:39,000 --> 00:00:42,000
Když je člověk šťastný, čas letí jako splašený.
8
00:00:42,000 --> 00:00:46,000
Petr s Arabelou se ani nenadáli a první velké výročí je zde.
9
00:00:46,000 --> 00:00:49,000
Kolik se za tu dobu změnilo?
10
00:00:49,000 --> 00:00:53,000
Z Honzíka a Marěnky se staly vysokoškoláci a paní Majerová ohdověla.
11
00:00:53,000 --> 00:00:57,000
Petr je úspěšným vynálezcem, ale ani Arabela nezahálí.
12
00:00:57,000 --> 00:01:03,000
Vzpomínáte si, jak kdysi tajně snědla kousek masa z bílého hada a porozuměla řeči zvířat?
13
00:01:03,000 --> 00:01:07,000
Teď toho využívá a překládá chovatelům zvířat rozhovory s jejich miláčky.
14
00:01:07,000 --> 00:01:10,000
Stala se populární a vystupuje dokonce v televizi.
15
00:01:10,000 --> 00:01:15,000
Dnes si však udělala volno, tak významné výročí se přece musí oslavit.
16
00:01:15,000 --> 00:01:17,000
Petře!
17
00:01:17,000 --> 00:01:18,000
Ano?
18
00:01:18,000 --> 00:01:20,000
Ahoj.
19
00:01:20,000 --> 00:01:22,000
Co tak brzy?
20
00:01:22,000 --> 00:01:28,000
Ani skončili jsem dneska tak dřív, ty gruntuješ? Zrovna dneska?
21
00:01:28,000 --> 00:01:30,000
No, proč ne?
22
00:01:31,000 --> 00:01:34,000
No, to je smutné, že nevíš, proč zrovna dneska, ne?
23
00:01:34,000 --> 00:01:49,000
Arabelo.
24
00:01:49,000 --> 00:01:53,000
Petře.
25
00:01:53,000 --> 00:01:56,000
Promiň, já myslel.
26
00:01:57,000 --> 00:01:59,000
To jsem teda moc zvědavá, kdo tohle všechno sní.
27
00:01:59,000 --> 00:02:05,000
Ty jsi zvážně myslel, že bych zapomněla?
28
00:02:05,000 --> 00:02:11,000
Svoje jsem vám už řekl!
29
00:02:11,000 --> 00:02:13,000
Ať se mi to nezapomínáš!
30
00:02:13,000 --> 00:02:15,000
Ať se mi to nezapomínáš!
31
00:02:15,000 --> 00:02:17,000
Ať se mi to nezapomínáš!
32
00:02:17,000 --> 00:02:19,000
Ať se mi to nezapomínáš!
33
00:02:19,000 --> 00:02:21,000
Ať se mi to nezapomínáš!
34
00:02:21,000 --> 00:02:23,000
Ať se mi to nezapomínáš!
35
00:02:24,000 --> 00:02:25,000
Svoje jsem vám už řekl!
36
00:02:25,000 --> 00:02:27,000
Odstěhující a hotovou!
37
00:02:27,000 --> 00:02:29,000
Ale spojď, bude smrtný, pojď!
27
00:01:59,000 --> 00:02:05,000
"Ty jsi zvážně myslel, že bych zapomněla?"
has a very wrong timestamp
" Ať se mi to nezapomínáš!" 6x
and so on
seems a lot of work to do yet for whisper developer