MFCreateSourceReaderFromURL failed
\Whisper\Examples\TranscribeCS\bin\x64\Release>TranscribeCS -l zh -ovtt L:\out.vtt -f "L:\1.mp3" -m L:\WhisperDesktop\models\ggml-tiny.bin Using GPU "NVIDIA GeForce GTX 1070", feature level 12.1, effective flags Wave32 | NoReshapedMatMul Loaded MEL filters, 62.8 kb RAM Loaded vocabulary, 51865 strings, 771.3 kb RAM Loaded 167 GPU tensors, 73.5388 MB VRAM Computed CPU base frequency: 3.6 GHz Loaded model from "L:\WhisperDesktop\models\ggml-tiny.bin" to VRAM StreamFile False MFCreateSourceReaderFromURL failed MFCreateSourceReaderFromURL failed
Based on log messages, something is wrong with the input *.mp3 file. The function which fails with an error status code is a part of Windows, see documentation. Maybe your file is incomplete or corrupt.
You could try to play the file with Windows Media Player which is shipped with Windows. I think it uses the same Media Foundation framework as this library.
@Const-me I've faced the same issue,when I use "-otxt test.txt" or any command to export the result into a file,this error occurs: MFCreateSourceReaderFromURL failed: ???????????. if I don't export the result to a file,the console can print the result successfully. when I use : "./main -m models/ggml-medium.bin -gpu "NVIDIA GeForce RTX 3060 Ti" -l zh -otxt test.txt test.mp3", the log is: Using GPU "NVIDIA GeForce RTX 3060 Ti", feature level 12.1, effective flags Wave32 | NoReshapedMatMul Loaded MEL filters, 62.8 kb RAM Loaded vocabulary, 51865 strings, 771.3 kb RAM Loaded 947 GPU tensors, 1462.12 MB VRAM Computed CPU base frequency: 2.112 GHz Loaded model from "models/ggml-medium.bin" to VRAM MFCreateSourceReaderFromURL failed: ??????????? when I use:"./main -m models/ggml-medium.bin -gpu "NVIDIA GeForce RTX 3060 Ti" -l zh test.mp3", the log is: Using GPU "NVIDIA GeForce RTX 3060 Ti", feature level 12.1, effective flags Wave32 | NoReshapedMatMul Loaded MEL filters, 62.8 kb RAM Loaded vocabulary, 51865 strings, 771.3 kb RAM Loaded 947 GPU tensors, 1462.12 MB VRAM Computed CPU base frequency: 2.112 GHz Loaded model from "models/ggml-medium.bin" to VRAM Created source reader from the file "test.m4a"
[00:00:00.000 --> 00:00:03.360] ????????????????? [00:00:03.360 --> 00:00:05.440] ??????????????? [00:00:05.440 --> 00:00:06.960] ?????????? [00:00:06.960 --> 00:00:08.560] ??????????? [00:00:08.560 --> 00:00:11.520] ????????????????? [00:00:11.520 --> 00:00:14.320] ???????????????? [00:00:14.320 --> 00:00:16.480] ???????????? [00:00:16.480 --> 00:00:18.880] ????????????? [00:00:18.880 --> 00:00:20.400] ????????? [00:00:20.400 --> 00:00:23.520] ????????????????? [00:00:23.520 --> 00:00:26.480] ??????????????????? [00:00:26.480 --> 00:00:28.400] ???????????? [00:00:28.400 --> 00:00:31.760] ?????????????????????? [00:00:31.760 --> 00:00:33.920] ?????????????? [00:00:33.920 --> 00:00:36.000] ???????????? [00:00:36.000 --> 00:00:37.840] ???????????? [00:00:37.840 --> 00:00:40.160] ????????????? CPU Tasks LoadModel 491.358 milliseconds RunComplete 6.12289 seconds Run 6.08943 seconds Callbacks 3.8429 milliseconds, 20 calls, 192.145 microseconds average Spectrogram 37.0978 milliseconds, 9 calls, 4.12198 milliseconds average Sample 28.3867 milliseconds, 259 calls, 109.601 microseconds average Encode 1.71456 seconds, 3 calls, 571.521 milliseconds average Decode 4.37092 seconds, 3 calls, 1.45697 seconds average DecodeStep 4.34239 seconds, 259 calls, 16.766 milliseconds average GPU Tasks LoadModel 375.613 milliseconds Run 6.00028 seconds Encode 1.63473 seconds, 3 calls, 544.911 milliseconds average EncodeLayer 1.3926 seconds, 72 calls, 19.3417 milliseconds average Decode 4.36555 seconds, 3 calls, 1.45518 seconds average DecodeStep 4.36541 seconds, 259 calls, 16.8549 milliseconds average DecodeLayer 4.11283 seconds, 6216 calls, 661.652 microseconds average Compute Shaders mulMatByRowTiled 2.3309 seconds, 73984 calls, 31.5055 microseconds average mulMatTiled 1.67265 seconds, 1587 calls, 1.05397 milliseconds average normFixed 254.363 milliseconds, 19054 calls, 13.3496 microseconds average addRepeatEx 238.618 milliseconds, 18792 calls, 12.6978 microseconds average fmaRepeat1 227.816 milliseconds, 19054 calls, 11.9563 microseconds average copyConvert 183.427 milliseconds, 12720 calls, 14.4204 microseconds average copyTranspose 170.825 milliseconds, 12576 calls, 13.5834 microseconds average addRepeatScale 145.853 milliseconds, 12432 calls, 11.7321 microseconds average softMaxFixed 143.98 milliseconds, 6288 calls, 22.8976 microseconds average softMaxLong 111.799 milliseconds, 259 calls, 431.658 microseconds average addRepeatGelu 103.309 milliseconds, 6294 calls, 16.414 microseconds average scaleInPlace 94.2239 milliseconds, 6288 calls, 14.9847 microseconds average softMax 80.372 milliseconds, 6216 calls, 12.9299 microseconds average addRepeat 77.3216 milliseconds, 6432 calls, 12.0214 microseconds average diagMaskInf 57.182 milliseconds, 6216 calls, 9.19916 microseconds average convolutionMain2Fixed 33.365 milliseconds, 3 calls, 11.1217 milliseconds average convolutionMain 21.9832 milliseconds, 3 calls, 7.32773 milliseconds average convolutionPrep1 4.0643 milliseconds, 6 calls, 677.383 microseconds average addRows 3.5103 milliseconds, 259 calls, 13.5533 microseconds average convolutionPrep2 1.0445 milliseconds, 6 calls, 174.083 microseconds average add 413.7 microseconds, 3 calls, 137.9 microseconds average Memory Usage Model 877.966 KB RAM, 1.42785 GB VRAM Context 101.426 MB RAM, 785.219 MB VRAM Total 102.284 MB RAM, 2.19467 GB VRAM
That -otxt parameter in the command-line app is a switch. It does not accept path of the output file. The output text file is always placed in the same folder where you have the input audio file.
When you specify -otxt file.txt file.mp3, the application interprets these arguments as -otxt switch followed by 2 input audio files, and then it fails trying to decode file.txt with media foundation.
I’ve just published a new version which includes a new PowerShell module. Might be a better fit for batch transcribe use cases.
The UX issue is fixed in version 1.10.1. Specifically, the error message now includes path to the input file it’s trying to decode.