nvda
nvda copied to clipboard
Speech and sounds are cut off at the end when using WASAPI
This issue at first looked similar to #14386 but on closer inspection is definitely different. The other issue seems to be with audio at the start of output being missed, this issue is that audio at the end of output is missing.
Steps to reproduce:
In NVDA advanced settings enable WASAPI audio output, press OK and restart NVDA. Listen carefully to speech output and audio beeps when there is no speech output (eg. progress bar beeps or mouse cursor movement beeps).
Actual behavior:
The very last part of the speech is cut off (may be about last letter), audio beeps are barely there (may be just a click). It seems like some of the audio output is going missing and based on the observation with speech the last part of the audio.
Expected behavior:
The whole speech output will be heard. Audio beeps will be complete. It should be identical output to when not using WASAPI (I don't get the issue when WASAPI is disabled).
NVDA logs, crash dumps and other attachments:
System configuration
NVDA installed/portable/running from source:
NVDA installed
NVDA version:
2023.3beta2
Windows version:
11
Name and version of other software in use when reproducing the issue:
Other information about your system:
This seems to be hardware specific, I observe it on a MSI Prestige 14 laptop from late 2019, I do not get the issue on my desktop computer.
Other questions
Does the issue still occur after restarting your computer?
Yes
Have you tried any other versions of NVDA? If so, please report their behaviors.
I think I have observed this issue since the WASAPI stuff was added.
If NVDA add-ons are disabled, is your problem still occurring?
Yes
Does the issue still occur after you run the COM Registration Fixing Tool in NVDA's tools menu?
Yes
Could you try to disable audio enhancements on your sound card of the machine you have this issue? Does that make any difference? You can disable the audio enhancements from the properties dialog of your soundcard in Windows settings. I see people having this problem on MSI laptops not only with NVDA, but also in other scenarios. It seems disabling the audio enhancements fixed those.
Also make sure you updated your sound driver to the last version, you can update it in the device manager in Windows.
I can reproduce such speech cuts also in virtual environments, with and without WASAPI. So I don't think this is really a WASAPI issue.
Also make sure you updated your sound driver to the last version, you can update it in the device manager in Windows.
Windows finds no updates to the driver.
I can reproduce such speech cuts also in virtual environments, with and without WASAPI. So I don't think this is really a WASAPI issue.
Virtual machines adds a layer of complexity so I would read nothing based on that. You would have to determine whether in the VM case whether its an issue between the host VM software and the physical sound card access or between NVDA and the guest VM driver. In short using a VM the bug may not have anything to do with NVDA. For me running direct on physical hardware the issue is only present for WASAPI, so we can definitely tell that wASAPI code in NVDA has some problems with some sound card drivers.
Could you try to disable audio enhancements on your sound card of the machine you have this issue? Does that make any difference? You can disable the audio enhancements from the properties dialog of your soundcard in Windows settings. I see people having this problem on MSI laptops not only with NVDA, but also in other scenarios. It seems disabling the audio enhancements fixed those.
When encountering other issues with NVDA audio output I had previously tried disabling audio enhancements. In those specific cases it did not fix the issue. As this computer audio is so bad without enhancements (tinny, weak, etc) I need to use enhancements to get it to even sound mediocre for a laptop. For this issue turning off audio enhancements at first glance seems to have fixed the clipping off the end of sound. However as I said the sound is so bad without enhancements its not really a place I want to be and I would like to see NVDA work better when audio enhancements are on. Could NVDA pad a little bit of silence on the end of audio it outputs (we are probably talking around 0.1 seconds at most). My feeling is that would not really hit performance as NVDA currently can interrupt speaking and such like, so if it needed to output something whilst the silence is playing it would simply be interrupting the silence (IE. would not need to wait for the silence to finish). May be the padded silence could even be a user setting for those who really don't want it or find they need a different length.
In this case VM has indeed to do with NVDA, the addon Bluetooth audio fix this issue for virtual environments by adding silence as far as I understand. It works even when you don’t have a Bluetooth headset connected, the name of the addon is actually confusing.
Von: mwhapples @.> Gesendet: Donnerstag, 21. September 2023 11:18 An: nvaccess/nvda @.> Cc: Adriani90 @.>; Comment @.> Betreff: Re: [nvaccess/nvda] Speech and sounds are cut off at the end when using WASAPI (Issue #15483)
I can reproduce such speech cuts also in virtual environments, with and without WASAPI. So I don't think this is really a WASAPI issue.
Virtual machines adds a layer of complexity so I would read nothing based on that. You would have to determine whether in the VM case whether its an issue between the host VM software and the physical sound card access or between NVDA and the guest VM driver. In short using a VM the bug may not have anything to do with NVDA. For me running direct on physical hardware the issue is only present for WASAPI, so we can definitely tell that wASAPI code in NVDA has some problems with some sound card drivers.
— Reply to this email directly, view it on GitHub https://github.com/nvaccess/nvda/issues/15483#issuecomment-1729186342 , or unsubscribe https://github.com/notifications/unsubscribe-auth/AGVCP4MFYTJE5K7TGMYKAFLX3QA6BANCNFSM6AAAAAA5AF34GI . You are receiving this because you commented. https://github.com/notifications/beacon/AGVCP4OBRIONPKBOJG3EVTDX3QA6BA5CNFSM6AAAAAA5AF34GKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTHCFFCM.gif Message ID: @.*** @.***> >
In this case VM has indeed to do with NVDA, the addon Bluetooth audio fix this issue for virtual environments by adding silence as far as I understand. It works even when you don’t have a Bluetooth headset connected, the name of the addon is actually confusing. Von: mwhapples @.> Gesendet: Donnerstag, 21. September 2023 11:18 An: nvaccess/nvda @.> Cc: Adriani90 @.>; Comment @.> Betreff: Re: [nvaccess/nvda] Speech and sounds are cut off at the end when using WASAPI (Issue #15483) I can reproduce such speech cuts also in virtual environments, with and without WASAPI. So I don't think this is really a WASAPI issue. Virtual machines adds a layer of complexity so I would read nothing based on that. You would have to determine whether in the VM case whether its an issue between the host VM software and the physical sound card access or between NVDA and the guest VM driver. In short using a VM the bug may not have anything to do with NVDA. For me running direct on physical hardware the issue is only present for WASAPI, so we can definitely tell that wASAPI code in NVDA has some problems with some sound card drivers. — Reply to this email directly, view it on GitHub <#15483 (comment)> , or unsubscribe https://github.com/notifications/unsubscribe-auth/AGVCP4MFYTJE5K7TGMYKAFLX3QA6BANCNFSM6AAAAAA5AF34GI . You are receiving this because you commented. https://github.com/notifications/beacon/AGVCP4OBRIONPKBOJG3EVTDX3QA6BA5CNFSM6AAAAAA5AF34GKWGG33NNVSW45C7OR4XAZNMJFZXG5LFINXW23LFNZ2KUY3PNVWWK3TUL5UWJTTHCFFCM.gif Message ID: @.*** @.***> >
BlueTooth audio addon is not an acceptable solution to me. I don't know the author of that addon and so cannot attribute any trust to it. After all NVDA, quite correctly, notes how much access NVDA and addons has to what I do on my computer and warns users to be careful with regard to addons. Having correct output from NVDA feels sufficiently core that I feel it should be fixed in NVDA itself rather than being farmed out to third parties.
@mwhapples wrote:
BlueTooth audio addon is not an acceptable solution to me. I don't know the author of that addon and so cannot attribute any trust to it. After all NVDA,
For what it's worth, that add-on has been around and used by many many people, for several years. And its author is well known.
Even though there are other add-ons which do this kind of thing for slightly different use cases, that one is one of the default recommendations of most people.
But of course, you must do what you think best. If you would rather wait until (if) a solution is accepted in core, that is your option.
What is the reason for this functionality having not been added to NVDA core yet? It is suggested there are addons doing what is needed, so someone has done the technical work, is it they haven't been interested in going through the effort of getting it approved as a NVDA contribution, was the work not of a sufficient technical quality, was there objections to adding such a fix or was it some other reason. I ask as may be I could put time into creating a fix for this if there is a chance it would be accepted. However if the gatekeepers of pull requests simply will never accept such a fix then its not worth even trying.
Hi,
I think part of the issue of getting add-ons (or parts thereof) accepted into NVDA Core has to do with coding style and keeping up with NVDA changes from the add-on. From its inception, I designed Windows App Essentials add-on to be compatible and comply with NVDA's coding standards and API, knowing that parts of it would end up becoming part of NVDA (achieved as of NVDA 2023.2). While the add-ons community is responsible for coordinating add-on releases and offering best practices, not all add-on authors follow NV Access's coding style for various reasons - Python training, language and culture (after all, add-ons community is a community of internatinal users and authors), opting to work on add-ons when there are reasons to get code and ideas from add-ons sent to NV Access for review (via GitHub pull request process), and the fact that many authors are volunteers and may not have time to sit down and think about potential talking point and issues to be seen when reworking parts of their code to meet NVDA Core's coding style and assumptions.
The author of Bluetooth Audio add-on (@mltony) has indicated (months ago) that he does not have time to maintain some of this add-on continuously - has time during the holidays to do so, according to a message he sent to an NVDA mailing list. Since then, people have volunteered to update his add-ons. While someone may say that volunteer maintainers can submit Tony's code as an NVDA pull request, I think it would be best to leave that decision to the add-on author since he is the expert on that code and offer advice on how to proceed. Since we did have several requests to bring code and ideas from Bluetooth Audio add-on to NVDA Core, I think that should provide a way to proceed if Tony gives his blessing (the task then becomes understanding the code and adopting it to fit existing NVDA coding style and API/code structure, wihch can take time).
As for the actual issue at hand: does changing speech synthesizers help? If yes, it could be something going on between the synth and WASAPI implementation (if I understood the situation correctly, that is).
Thanks.
I'd be happy if someone can add Bluetooth Audio as a feature in NVDA core. With all my other commitments in life I don't have time to do it myself in the foreseeable future. Prepareing PR to NVDA core should be quite straightforward - just some simple cleanup should be enough plus finding a code pointer during NVDA startup where to initialize background noise thread. As for whether it's worth to include Bluetooth Audio in the core, My opinion is yes. Given the amount of issues we've seen recently for many kinds of audio issues that can be fixed with background noise/silence thread I'd think it's justified given the demand. But of course final decision is up to NVDA devs.
@josephsl thanks for the detailed answer. By the sound of it it is a case of finding someone to do the work to the standard rather than there having been objections to it being done. This is certainly something I could take time to look into fixing then.
Hi,
I think part of the issue of getting add-ons (or parts thereof) accepted into NVDA Core has to do with coding style and keeping up with NVDA changes from the add-on. From its inception, I designed Windows App Essentials add-on to be compatible and comply with NVDA's coding standards and API, knowing that parts of it would end up becoming part of NVDA (achieved as of NVDA 2023.2). While the add-ons community is responsible for coordinating add-on releases and offering best practices, not all add-on authors follow NV Access's coding style for various reasons - Python training, language and culture (after all, add-ons community is a community of internatinal users and authors), opting to work on add-ons when there are reasons to get code and ideas from add-ons sent to NV Access for review (via GitHub pull request process), and the fact that many authors are volunteers and may not have time to sit down and think about potential talking point and issues to be seen when reworking parts of their code to meet NVDA Core's coding style and assumptions.
The author of Bluetooth Audio add-on (@mltony) has indicated (months ago) that he does not have time to maintain some of this add-on continuously - has time during the holidays to do so, according to a message he sent to an NVDA mailing list. Since then, people have volunteered to update his add-ons. While someone may say that volunteer maintainers can submit Tony's code as an NVDA pull request, I think it would be best to leave that decision to the add-on author since he is the expert on that code and offer advice on how to proceed. Since we did have several requests to bring code and ideas from Bluetooth Audio add-on to NVDA Core, I think that should provide a way to proceed if Tony gives his blessing (the task then becomes understanding the code and adopting it to fit existing NVDA coding style and API/code structure, wihch can take time).
As for the actual issue at hand: does changing speech synthesizers help? If yes, it could be something going on between the synth and WASAPI implementation (if I understood the situation correctly, that is).
Thanks.
Thanks for clarifying where you are with this one. I appreciate the time constraints thing, I have enough personal projects I want to get done as well. However as part of my work for APH I do have some hours available to make contributions to NVDA. Reading the description of your addon I am not sure whether it would have been the approach I would have taken for this specific issue, it is as if my sound card prematurely terminates sound playback and so I would have taken a simpler approach that NVDA should pad additional silence to the end of the audio it produces. However I do see how my approach would not deal with the bluetooth devices entering sleep mode issues, so may be I should follow your approach of constant silence being played to create a more general fix for audio issues.
I'd be happy if someone can add Bluetooth Audio as a feature in NVDA core. With all my other commitments in life I don't have time to do it myself in the foreseeable future. Prepareing PR to NVDA core should be quite straightforward - just some simple cleanup should be enough plus finding a code pointer during NVDA startup where to initialize background noise thread. As for whether it's worth to include Bluetooth Audio in the core, My opinion is yes. Given the amount of issues we've seen recently for many kinds of audio issues that can be fixed with background noise/silence thread I'd think it's justified given the demand. But of course final decision is up to NVDA devs.
As for the actual issue at hand: does changing speech synthesizers help? If yes, it could be something going on between the synth and WASAPI implementation (if I understood the situation correctly, that is).
Thanks.
@josephsl As for whether this is synth specific, I don't think so as it happens to beeps from NVDA such as progress bar update beeps, mouse cursor movement beeps, speech mode beeps, etc. The beeps being truncated (virtually never played may be at best just a click is heard) occurs when no speech is being produced. I don't know the ins and outs of NVDA and how it produces sound with WASAPI, but based on what I know of other audio APIs it feels like may be the last buffer write gets lost. That though is just a guess and something I would investigate if I were to examine the issue itself rather than try and adapt the bluetooth audio addon code into a pull request to be included into NVDA.
cc @jcsteh for awareness since you have authored the WASAPI support.
Could NVDA pad a little bit of silence on the end of audio it outputs (we are probably talking around 0.1 seconds at most).
The question is when we could do this. Audio is fed to nvwave in chunks. In some cases, there's currently no indication that we've fed the last chunk. Many speech synths call WavePlayer.idle(), so we could use that. However, tones doesn't, for example, and fixing that would probably require moving tones to its own thread.
This does raise the question of why WinMM (which is what NVDA uses when WASAPI is disabled) isn't affected here. As i understand it, WinMM is just a layer on top of WASAPI these days, but there must be something in its implementation that prevents this problem. I can't fathom what, since we don't provide any additional info to WinMM than we do to WASAPI.
Thanks @jcsteh for that information, it gives me a little background on this stuff and helps explains why addons like the bluetooth audio addon use a separate thread and intercept the speak and tones calls to start their playback of silence. Something in my gut tells me this bug is a correctness bug and to use the bluetooth audio solution (either making a core NVDA contribution with that functionality or to just use the addon) is to use a sledgehammer to crack a nut and is just papering over the cracks. My initial reading into WASAPI found this Microsoft sample https://learn.microsoft.com/en-us/windows/win32/coreaudio/rendering-a-stream I cannot help noticing in that sample after the loop feeding the data into the buffer, the following lines:
// Wait for last data in buffer to play before stopping.
Sleep((DWORD)(hnsActualDuration/REFTIMES_PER_MILLISEC/2));
hr = pAudioClient->Stop(); // Stop playing.
As I said in another comment, one line of investigation I would possibly follow is ensuring buffers are flushed or based on that example that all audio is played before stopping. In the NVDA WASAPI code I saw no similar wait after the loop feeding the data to the buffer. My attempt to add such a wait did not solve the issue, may be that wasn't the cause or may be my C++ is not good enough to have understood/fixed it correctly (see my change at https://github.com/aphtech/nvda/tree/wasapiCompleteAudio ).
Something in my gut tells me this bug is a correctness bug and to use the bluetooth audio solution (either making a core NVDA contribution with that functionality or to just use the addon) is to use a sledgehammer to crack a nut and is just papering over the cracks.
There's some sense in that argument. On the other hand, the fact that your audio driver truncates samples only when enhancements are enabled and most other audio drivers do not could reasonably suggest a correctness bug in the driver. In that case, a sledgehammer might be the most reasonable workaround to address the broadest number of issues, since Bluetooth audio users suffer from similar problems even without WASAPI.
As I said in another comment, one line of investigation I would possibly follow is ensuring buffers are flushed or based on that example that all audio is played before stopping.
That's just the thing. We don't explicitly stop audio immediately after data is sent, neither with WASAPI nor without. We do now stop audio after a short timeout to prevent interference with system sleep, but that is at least 10 seconds.
In the NVDA WASAPI code I saw no similar wait after the loop feeding the data to the buffer.
There isn't one, and if you added one, you would introduce gaps after each chunk of audio fed by the synthesiser. To allow for best possible responsiveness, most synths don't feed an entire utterance in one chunk, with OneCore being a notably annoying exception. Instead, they feed the audio in small chunks as it is generated.
The reason the wait is needed in that sample code is that it calls stop as soon as audio is done. If you called stop too early, you would certainly truncate audio. In contrast, NVDA doesn't call stop until much later, so the stop call cannot be the cause of truncated audio.
I was going to suggest that you could experiment with feeding a small chunk of silence inside WasapiWavePlayer.idle(). That way, synths that call idle() would feed a small padding of silence at the end of each utterance, though this wouldn't help tones. However, I then realised that this will cause slight pauses between utterances when there are multiple consecutive utterances, which is definitely not what anyone wants.
As a curiosity, can you reproduce this with the NVDA OneCore driver? If so, how about Narrator with a OneCore voice?
Something in my gut tells me this bug is a correctness bug and to use the bluetooth audio solution (either making a core NVDA contribution with that functionality or to just use the addon) is to use a sledgehammer to crack a nut and is just papering over the cracks.
There's some sense in that argument. On the other hand, the fact that your audio driver truncates samples only when enhancements are enabled and most other audio drivers do not could reasonably suggest a correctness bug in the driver. In that case, a sledgehammer might be the most reasonable workaround to address the broadest number of issues, since Bluetooth audio users suffer from similar problems even without WASAPI. May be a correctness bug in the driver, or may be a very strict driver which requires things to be done perfectly according to spec, may be others let you get away with certain things. Either way I do get the feeling the most effective use of time to benefit to the user base would be to take the approach of the bluetooth audio addon.
As a curiosity, can you reproduce this with the NVDA OneCore driver? If so, how about Narrator with a OneCore voice?
Will give that a go. Yes using onecore in both NVDA and narrator and comparing results may be interesting. Will let you know when I have tried it.
As a curiosity, can you reproduce this with the NVDA OneCore driver? If so, how about Narrator with a OneCore voice?
Interestingly I cannot reproduce it with the OneCore voices. Also Narrator does not show it with either OneCore or the new natural voices. IMPORTANT: In NVDA it is more than speech affected, this includes beeps, so it is not specific to the espeapk synth support. And yes beeps are affected when using OneCore voices. I guess I could see if narrator has problems with any of its sounds.
Given that, my guess is that OneCore itself pads the end of speech with a little bit of silence, but eSpeak does not. The same is true of our beeps: we don't pad them with any silence.
Any padding is going to result in a slightly longer delay between utterances, which is problematic unless we can determine for sure that there isn't another consecutive utterance already waiting. There might be a way to make that determination, but it will probably require additional cooperation between NVDA's core speech manager and synth drivers which doesn't currently exist.
It might be interesting for you to experiment as I suggested above - feed a short chunk of silence in idle() - to at least see how much silence you would need in order for this to be useful. If it's only 10 ms or so, that might be acceptable between utterances. If it's significantly longer, probably not so much.
Given that, my guess is that OneCore itself pads the end of speech with a little bit of silence, but eSpeak does not. The same is true of our beeps: we don't pad them with any silence.
Any padding is going to result in a slightly longer delay between utterances, which is problematic unless we can determine for sure that there isn't another consecutive utterance already waiting. There might be a way to make that determination, but it will probably require additional cooperation between NVDA's core speech manager and synth drivers which doesn't currently exist.
Well done testing with narrator and its sounds, no cut offf there, however I don't know whether it does any padding.
Thinking about this from a different direction, it might be possible for us to detect that there is less than some amount of audio (e.g. 30 ms) in the buffer without a new chunk having been fed and pad with silence at that point. However, that would require us to be able to wait or schedule checks for that case. We can't currently do that because audio is fed on the same thread as it is received from the synth, beeps, etc. That kind of fix would probably require a complete refactor of the audio code so that we manage all audio in its own dedicated thread and shuttle things between the calling threads and the audio thread.
It might be interesting for you to experiment as I suggested above - feed a short chunk of silence in idle() - to at least see how much silence you would need in order for this to be useful. If it's only 10 ms or so, that might be acceptable between utterances. If it's significantly longer, probably not so much.
To give you an idea of the time cut off I get. With a mouse position beep or progress bar update beep, I get a little click, may be a slight hint as to the pitch but its hard to tell the pitch. For "Speech mode beeps" those beeps never happen.
Now I think about it, this is possibly how WinMM isn't affected on your system. It probably does something similar to what I've just outlined, at the cost of higher latency.