photobooth [FEATURE]: Triggering picture/other actions by voice

Is your feature request related to a problem?

No

Description

Today, there is a way to trigger actions by clicking on virtual (touchscreen) and phydsical buttons or by doing API calls. How about using voice?

Describe the solution you'd like

An offline and on-device wake word detection to trigger actions through API calls

Describe alternatives you've considered

Independant device or install to provide the service. Maybe some new actions can be added to the Photobooth API.

Additional context

I've made a script to use Porcupine, but so far nothing si directly related to PhotoboothProject:

main.py

import pvporcupine
import pyaudio
import struct
import subprocess
import configparser
import os

# --- CONFIGURATION FROM config.ini ---
config = configparser.ConfigParser()
config_path = os.path.join(os.path.dirname(os.path.abspath(__file__)), 'config.ini')
config.read(config_path)

#ACCESS_KEY = config['PORCUPINE']['ACCESS_KEY']
KEYWORDS = [k.strip() for k in config['PORCUPINE']['KEYWORDS'].split(',')]
KEYWORD_PATHS = [k.strip() for k in config['PORCUPINE']['KEYWORD_PATHS'].split(',')]
MODEL_PATH = config['PORCUPINE']['MODEL_PATH']
COMMAND_TO_EXECUTE = config['PORCUPINE']['COMMAND_TO_EXECUTE']

    # --- END OF CONFIGURATION ---

try:
    # Initialize Porcupine
    porcupine = pvporcupine.create(
        #access_key=ACCESS_KEY,
        keyword_paths=KEYWORD_PATHS,
        model_path=MODEL_PATH
    )

    # Initialize audio stream with PyAudio
    pa = pyaudio.PyAudio()
    audio_stream = pa.open(
        rate=porcupine.sample_rate,
        channels=1,
        format=pyaudio.paInt16,
        input=True,
        frames_per_buffer=porcupine.frame_length
    )

    print(f"Ready. Listening for keywords: {', '.join(KEYWORDS)}")
    print("Press Ctrl+C to exit.")

    while True:
        pcm = audio_stream.read(porcupine.frame_length)
        pcm = struct.unpack_from("h" * porcupine.frame_length, pcm)

        # Process audio with Porcupine
        keyword_index = porcupine.process(pcm)

        # If a keyword is detected (keyword_index >= 0)
        if keyword_index >= 0:
            detected = KEYWORDS[keyword_index]
            print(f"Keyword '{detected}' detected!")
            
            # --- TRIGGER YOUR ACTION HERE ---
            print(f"Executing command: '{COMMAND_TO_EXECUTE}'")
            subprocess.Popen(COMMAND_TO_EXECUTE.split())
            print("Waiting for the next detection...")


except KeyboardInterrupt:
    print("Script stopped.")
finally:
    if 'porcupine' in locals() and porcupine is not None:
        porcupine.delete()
    if 'audio_stream' in locals() and audio_stream is not None:
        audio_stream.close()
    if 'pa' in locals() and pa is not None:
        pa.terminate()

config.ini

[PORCUPINE]
ACCESS_KEY = 
KEYWORDS = sorbet citron
KEYWORD_PATHS = sorbet-citron_fr_linux_v3_0_0.ppn
MODEL_PATH = porcupine_params_fr.pv
COMMAND_TO_EXECUTE = curl --request GET http://localhost:14711/commands/start-picture

Jul 05 '25 20:07 wikijm

Hey and thanks for the suggestion! You should use the Remotebuzzer server feature of Photobooth and trigger via simple web request from within your voice detection script.

Best regards

Andi

Sep 12 '25 08:09 andi34

Hi Andy,

Thanks!

That is indeed what my script is doing through the config file, as I wanted an evolutive solution 🙂

Sep 12 '25 13:09 wikijm

Right, I've missed the COMMAND_TO_EXECUTE part scrolling through some open issues and PRs here 😅

Sep 12 '25 17:09 andi34