WhisperKit
WhisperKit copied to clipboard
On-device Speech Recognition for Apple Silicon
The Eager streaming mode implies that we predict the same token at least twice. This is a great opportunity to design a speculative decoding technique that can leverage a fast...
Similar to the example app, we should bring over the options for early stopping and add CLI arguments for setting them. https://github.com/argmaxinc/WhisperKit/blob/0f19f7ed484f56889bb383c6bd687018021764d8/Examples/WhisperAX/WhisperAX/Views/ContentView.swift#L1040-L1069
data:image/s3,"s3://crabby-images/d9c3f/d9c3fd8fec285ff2c892eac88ed14dd26e8aa3be" alt="20240315-161729" data:image/s3,"s3://crabby-images/ed18d/ed18dacd7152ba64a11b22e289da999c0278045e" alt="20240315-161736" I found the sumOfbestIndicesResult value is NaN.
Hi, this is a great project, thank you for creating it. I see that you have a nice, simple example for transcribing a prerecorded audio file. Looking through the sample...
**version:** `0.7.2` **snippet to reproduce:** ```swift import Foundation import WhisperKit Task { guard let desktopURL = FileManager.default.urls( for: .desktopDirectory, in: .userDomainMask ).first else { return } print("desktopURL: \(desktopURL)") let model...
Here's what I get ``` Test Case '-[WhisperKitTests.FunctionalTests testRealTimeFactorLarge]' started. /Users/jrp/Documents/AI/whisperkit/Sources/WhisperKit/Core/Models.swift:34: error: -[WhisperKitTests.FunctionalTests testRealTimeFactorLarge] : failed: caught error: "Error Domain=com.apple.CoreML Code=0 "Unable to load model: file:///Users/jrp/Documents/AI/whisperkit/.build/arm64-apple-macosx/debug/whisperkit_WhisperKitTests.bundle/whisperkit-coreml/openai_whisper-large-v3/MelSpectrogram.mlmodelc/. Compile the model with...
Hey! Thank you for this great repo. I'd love some help with the following issues: - I can't get keyword-level timestamp from WhisperKit. It gives me full sentences instead of...
Hello, I’ve encountered an issue where WhisperKit, which works perfectly on visionOS 1, no longer functions correctly on visionOS 2 beta 3. (native code) Here’s a snippet of the code...
PR for the issue https://github.com/argmaxinc/WhisperKit/issues/27. - Added SilenceDetectionFilter similar to LanguageDetectionFilter. - Implemented detectSilence similar to detectLanguage in TextDecoder that does a single forward pass from to detect silence periods....
I have the following function defined in my Swift iOS app, copied largely from the example app in the repo: ``` func loadModel(isRedownloadAttempt: Bool) { // First check what's already...