read-aloud icon indicating copy to clipboard operation
read-aloud copied to clipboard

Long pause between paragraphs using Microsoft voices in Edge

Open mouhanede opened this issue 5 months ago • 1 comments

Read Aloud in Microsoft Edge suddenly started pausing 2–5 seconds between every paragraph when using Microsoft online voices (e.g., Aria). It worked fine before. Other voices (like Google) don’t have this issue, and it works perfectly on another PC.

mouhanede avatar Jul 15 '25 14:07 mouhanede

Can the maintainers really solve this issue please?

I have been struggling with this issue for a while using Piper voices on Brave browser and M1 Mac and it has ruined my experience with this extension. It all started to occur around manifest v3 changes which first hit chrome and more recently Brave.

It's more severe in my case in that it takes very long to synthesize any sentence. I highly suspect it's an issue of background throttling because upon opening the extension's tab, it synthesizes immediately. The issue also became apparent immediately after Chrome rolled out MV3 and Brave following through later. In fact I believe it has to be MV3 and background throttling because as soon as it stopped working on Chrome, I checked and it was right at the time MV3 was enforced. Similar contagious story for Brave browser.

Support gave instructions like a non x1 speed forces the audio to playback in a specialized background page instead of inside the tab. However this did not work. They also said you're the first and only one to report this issue but I guess this issue is creeping up on us before our eyes.

Please diagnose this problem as I'm stuck with robotic voices. My whole productivity is a function of my experience with this extension.

MartinB47 avatar Jul 24 '25 03:07 MartinB47

Long pauses between sentences would indeed make the extension very unusable.

There are possibly other reasons, depending on the voice you're using. Piper voices are locally-synthesized AI voices, they require a lot of CPU power. We're synthesizing next sentence as the current one is being spoken; if it takes too long, there'll be a silence gap. Piper models should run well on most devices, even those with weak CPUs, but make sure you're using the low or medium-quality Piper voices, don't use the high versions, those most definitely are non-realtime.

If you're using MS Edge "Microsoft Natural" voices, they are synthesized on Microsoft's cloud server. As such they may be subject to network connectivity issues, or quite possibly their servers could be temporarily overwhelmed. There's no prefetching possible with these voices, as they appear to extensions as browser-native voices; you can't prefetch the audio, you can only tell it to speak immediately some text.

ken107 avatar Dec 11 '25 15:12 ken107