[EN] Match articles "a" and "an" for <the>
Whisper (for me) seems to always put article a after vacuum commands start or return.
Someone might also actually say it this way so let's handle that.
First of all, I don't think it's a good idea to prefix <name> (which can contain <the>) with an indefinite article. You would be able to say start a the roborock, which doesn't make grammatical sense.
Second of all, if we're doing this, I don't see why there wouldn't be the an form as well in there.
Third, I don't see why this would apply strictly to vacuums and not every other entity.
Fourth, to counter the the issues above and create new ones, why not add a[n] to <the>?
Finally, like I said numerous times before, I don't think it's wise to add incorrect sentences just to please Whisper or any other STT. The proper solution here would be to fix Whisper.
I'd like to hear the other language leaders' comments on this.
First of all, I don't think it's a good idea to prefix
<name>(which can contain<the>) with an indefinite article. You would be able to saystart a the roborock, which doesn't make grammatical sense.Second of all, if we're doing this, I don't see why there wouldn't be the
anform as well in there.Third, I don't see why this would apply strictly to vacuums and not every other entity.
Fourth, to counter the the issues above and create new ones, why not add
a[n]to<the>?Finally, like I said numerous times before, I don't think it's wise to add incorrect sentences just to please Whisper or any other STT. The proper solution here would be to fix Whisper.
I'd like to hear the other language leaders' comments on this.
Oh I agree on most of this, and I can change the PR to have a[n]
Also, as much as I would love to fix Whisper to follow grammar, but there is so much you can do with it to influence the output. OpenAI probably trained it on mega powerful datacenters with all the speech data they sucked off the internet, it's not really realistic to be able to somehow fix all the edge cases like this without the expertise and resources they had.
And yes, there is speech-to-phrase as an alternative, but in my testing bigger Whisper models are far, far better at understanding noisy, imperfect audio that you typically get out of assist satellites, as well as telling apart "turn off" and "turn on", so I am afraid it is here to stay for those who need local STT.
I went ahead and added "a[n]" directly to