[FEATURE]: Triggering picture/other actions by voice
Is your feature request related to a problem?
No
Description
Today, there is a way to trigger actions by clicking on virtual (touchscreen) and phydsical buttons or by doing API calls. How about using voice?
Describe the solution you'd like
An offline and on-device wake word detection to trigger actions through API calls
Describe alternatives you've considered
Independant device or install to provide the service. Maybe some new actions can be added to the Photobooth API.
Additional context
I've made a script to use Porcupine, but so far nothing si directly related to PhotoboothProject:
main.py
import pvporcupine
import pyaudio
import struct
import subprocess
import configparser
import os
# --- CONFIGURATION FROM config.ini ---
config = configparser.ConfigParser()
config_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'config.ini')
config.read(config_path)
#ACCESS_KEY = config['PORCUPINE']['ACCESS_KEY']
KEYWORDS = [k.strip() for k in config['PORCUPINE']['KEYWORDS'].split(',')]
KEYWORD_PATHS = [k.strip() for k in config['PORCUPINE']['KEYWORD_PATHS'].split(',')]
MODEL_PATH = config['PORCUPINE']['MODEL_PATH']
COMMAND_TO_EXECUTE = config['PORCUPINE']['COMMAND_TO_EXECUTE']
# --- END OF CONFIGURATION ---
try:
# Initialize Porcupine
porcupine = pvporcupine.create(
#access_key=ACCESS_KEY,
keyword_paths=KEYWORD_PATHS,
model_path=MODEL_PATH
)
# Initialize audio stream with PyAudio
pa = pyaudio.PyAudio()
audio_stream = pa.open(
rate=porcupine.sample_rate,
channels=1,
format=pyaudio.paInt16,
input=True,
frames_per_buffer=porcupine.frame_length
)
print(f"Ready. Listening for keywords: {', '.join(KEYWORDS)}")
print("Press Ctrl+C to exit.")
while True:
pcm = audio_stream.read(porcupine.frame_length)
pcm = struct.unpack_from("h" * porcupine.frame_length, pcm)
# Process audio with Porcupine
keyword_index = porcupine.process(pcm)
# If a keyword is detected (keyword_index >= 0)
if keyword_index >= 0:
detected = KEYWORDS[keyword_index]
print(f"Keyword '{detected}' detected!")
# --- TRIGGER YOUR ACTION HERE ---
print(f"Executing command: '{COMMAND_TO_EXECUTE}'")
subprocess.Popen(COMMAND_TO_EXECUTE.split())
print("Waiting for the next detection...")
except KeyboardInterrupt:
print("Script stopped.")
finally:
if 'porcupine' in locals() and porcupine is not None:
porcupine.delete()
if 'audio_stream' in locals() and audio_stream is not None:
audio_stream.close()
if 'pa' in locals() and pa is not None:
pa.terminate()
config.ini
[PORCUPINE]
ACCESS_KEY =
KEYWORDS = sorbet citron
KEYWORD_PATHS = sorbet-citron_fr_linux_v3_0_0.ppn
MODEL_PATH = porcupine_params_fr.pv
COMMAND_TO_EXECUTE = curl --request GET http://localhost:14711/commands/start-picture
Hey and thanks for the suggestion! You should use the Remotebuzzer server feature of Photobooth and trigger via simple web request from within your voice detection script.
Best regards
Andi
Hi Andy,
Thanks!
That is indeed what my script is doing through the config file, as I wanted an evolutive solution 🙂
Right, I've missed the COMMAND_TO_EXECUTE part scrolling through some open issues and PRs here 😅