browser-extension icon indicating copy to clipboard operation
browser-extension copied to clipboard

Add voice to text command processing

Open arcticfly opened this issue 2 years ago • 4 comments

Currently users have to type in a command after activating the Taxy extension in order to start execution on their command. It would be really useful to have a voice-to-command feature that minimized typing.

arcticfly avatar Mar 27 '23 18:03 arcticfly

This one might be tricky. As far as I can tell chrome doesn't let you request/use the mic from inside a dev tools panel.

So the only way I can think of is injecting a script into the web page you're viewing to request mic permission and use the Speech Recognition API within the web page, then send the text result to the extension background script to update the devtools UI. Unless this can be done with some type of extension pop-up window.

Christopher-Hayes avatar Mar 31 '23 23:03 Christopher-Hayes

Got far as adding the record button and trying to get recording from the current page working. Didn't quite get the messaging events working, but my WIP code is here if anyone wants something to kickstart the effort.

The downside of request for mic inside the page is the user may find that undesirable and this would need to be done again on every different domain. So, if an extension popup can work, that might be the better option.

I did have the Speech Recognition API working when previewing the extension in a separate tab while debugging, so it can work, just not from inside devtools.

Christopher-Hayes avatar Apr 01 '23 01:04 Christopher-Hayes

Have you tried using an extension popup window? This allows you to avoid injecting a script into the web page and requesting mic permission on every different domain.

Create a new popup.html file Design the popup window's structure, including a button to start voice command recognition and a container to display the recognized command.

Write the CSS styling for popup.html Add your preferred styles to the popup.html file to make the interface visually appealing.

Create a new popup.js file This file will handle the voice-to-text processing using the Web Speech API when the user clicks the "Start Voice Command" button.

Add the event listener for the "Start Voice Command" button In the popup.js file, add an event listener that triggers the voice command recognition when the button is clicked.

Implement the Web Speech API for voice recognition In the popup.js file, use the webkitSpeechRecognition API for voice recognition, and handle the onresult event to obtain the final transcript.

Send the final transcript to the background or content script Once the final transcript is available, send it to the background or content script using chrome.runtime.sendMessage.

Update the manifest.json file Add the browser_action property with the default_popup and default_icon attributes to specify the popup window and the extension icon.

Add a listener to receive the voice command in the background or content script In the background or content script, use chrome.runtime.onMessage.addListener to listen for messages with the voice command.

Process the voice command and update the devtools UI When the voice command is received in the background or content script, process it as needed and update the devtools UI accordingly.

CryptoMitch avatar Apr 09 '23 10:04 CryptoMitch

@CryptoMitch Yeah, extension popups would be the way to go if mic permissions work there. I wasn't positive if you could request that permission from the popup UI. But, if you'd like to take that on, go ahead. Btw, was that written by ChatGPT? I'm a big fan of ChatGPT, but I'm not sure dumping the entire output into discussion is helpful for contributors, especially if you're not familiar enough with the output to definitively say if this will solve the problem we're trying to solve.

Christopher-Hayes avatar Apr 09 '23 21:04 Christopher-Hayes