VoiceInk icon indicating copy to clipboard operation
VoiceInk copied to clipboard

Support Apple's new faster Transcription APIs

Open reneleonhardt opened this issue 6 months ago • 13 comments

Are there plans to support the new transcription APIs available in the latest betas? https://www.macrumors.com/2025/06/18/apple-transcription-api-faster-than-whisper/

Example code to transcribe videos or generate song lyrics: https://github.com/finnvoor/yap/blob/main/Sources/yap/Transcribe.swift

reneleonhardt avatar Jun 18 '25 12:06 reneleonhardt

@reneleonhardt Available now in v1.35

Also wrote a short blog on this here: https://prakashjoshipax.com/apple-new-transcription-api-accuracy/

Please share your findings regarding accuracy.

Beingpax avatar Jun 21 '25 16:06 Beingpax

Thank you for supporting it so quickly 🚀

I'm afraid I'll have to wait until macOS 26 final 😅

In the meantime, some modern chatbots use a multi-stage engine to improve the result of the first LLM.

Could it help in this case to let a text-only LLM improve the transcription before it is shown to the user?

Could an audio model be trained by the user? No LLM was able to understand your last sentence for example, maybe some initial training like the user has to do for setting up Siri could help.

reneleonhardt avatar Jun 21 '25 19:06 reneleonhardt

Also wrote a short blog on this here: https://prakashjoshipax.com/apple-new-transcription-api-accuracy/

Interesting blog post. It is great that you are exploring this already. A couple thoughts on the example transcriptions at the end of the blog post:

  • It seems your "Accurate transcription" in this example is incomplete/incorrect, as it would be weird for all three models to hallucinate the same "the original transcript, the accurate transcript" if the second part of that was not in the audio.
  • I think you not being a native English speaker (I am not either) likely plays into the transcription accuracy as you suspected. Because I think "Search about it" is not something a native speaker would say, and thus it is hard for the models to arrive at.

pocketpixels avatar Jun 21 '25 21:06 pocketpixels

@Beingpax thanks for implementing this! I have 2 small bug to report.

  1. In only seems to work with the default mode. For example, here's a screenshot of editing a current power mode i have: the Apple Transcription option doesn't show up at all:

Image

  1. When selecting it, see screenshot below: it says "auto detect" is an option, but when you click on the dropdown there isn't such an option:

Image

  1. From what I could gather online, it is supposed to support whatever locale installed on the machine. I have brazillian portuguese installed but the only options i see in the dropdown above is "english, spanish, german and french" (the last 3 I don't even have on my computer), so not sure what's going on.

lfilho avatar Jun 23 '25 21:06 lfilho

Auto-detection is not available with Apple Speech. Need to update the UI.

Regarding your issue about not having the Brazilian Portuguese language, this is something that needs to be fixed.

The language selection UI for both local models and Apple native models needs to be updated, because the languages are handled differently.

Beingpax avatar Jun 24 '25 12:06 Beingpax

@Beingpax thanks, I see Portuguese is there now, but i can't get it to actually work, on the Power Mode screen it doesn't even show up:

Image

If I go to the "Ai Models" screen and set it as the default in there, and the invoke the keyboard shortcut, it reverts back to Large v3 Turbo before my eyes:

Image

lfilho avatar Jun 27 '25 22:06 lfilho

Is this resolved?

Beingpax avatar Jun 30 '25 17:06 Beingpax

Still happening. Also, I've just checked for updates and there was none -- i'm running version 1.36 (136)

lfilho avatar Jun 30 '25 17:06 lfilho

You can add the Parakeet v2 model instead if you want speed. It's fairly accurate, maybe not quite as accurate as Whisper, but probably close and probably better than Apple's dictation.

I managed to incorporate it into a personal fork of VoiceInk and it's working very well.

slumdev88 avatar Jul 03 '25 16:07 slumdev88

How did you add parakeet v2? Can you add a pull request? @slumdev88

Beingpax avatar Jul 03 '25 17:07 Beingpax

@Beingpax Pull request added. I am a little new to this but enjoying building experimental features. I'm very passionate about dictation apps

slumdev88 avatar Jul 04 '25 07:07 slumdev88

It would be nice to also add the Speed and Accuracy rating for Apple Speech in the App.

Especially now that Parakeet v3 is there, so that people can better choose.

Jiehong avatar Sep 15 '25 08:09 Jiehong

It would be nice to also add the Speed and Accuracy rating for Apple Speech in the App.

Especially now that Parakeet v3 is there, so that people can better choose.

I'm also super interested in seeing how Apple Speech stacks up against Parakeet when it comes to Speed / Accuracy!

BitPhoenix avatar Sep 21 '25 10:09 BitPhoenix

Great that Apple Speech has been included 🎉

Would it be possible to determine speed and accuracy? Now it's the only model missing those ratings.

reneleonhardt avatar Oct 31 '25 18:10 reneleonhardt