kanata Feature request: Accessible output messages

Is your feature request related to a problem? Please describe.

As a blind user, I find Kanata invaluable for my workflow. However, there's one area that could be improved. When I attempt to switch key layers or perform actions that change the keyboard layout and mistakenly fail, it significantly slows me down since no change occurs.

Describe the solution you'd like.

I suggest implementing a feature where Kanata outputs messages to screen readers, such as "Layer switched to media" or similar notifications. This would greatly enhance accessibility and efficiency for blind users as there would be know question of what action they performed.

Describe alternatives you've considered.

I have considered using multi actions to play different audio files based on layers, but this would be highly complex to get right and I would have to find a sound for every single layer I make.

Additional context

No response

Aug 04 '24 23:08 dragonwolfsp

On live reload (multi (cmd reload.cmd) lrld-next) I use a cmd file that "speaks".

it veryfies the config is valid or reads the errormessage (not that usefull for me). I perfer start "kanata error" cmd /k kanata --check to get a window displaying the error.

kanata --check 2> test.log
IF ERRORLEVEL 1 GOTO Fail
@PowerShell Add-Type -AssemblyName System.Speech; (New-Object System.Speech.Synthesis.SpeechSynthesizer).Speak('ai')
GOTO :EOF
:Fail
grep help test.log > test.say
@PowerShell Add-Type -AssemblyName System.Speech; (New-Object System.Speech.Synthesis.SpeechSynthesizer).Speak((gc "test.say"))

maybe all you need is a symbols.cmd that says 'symbols'...

@PowerShell Add-Type -AssemblyName System.Speech; (New-Object System.Speech.Synthesis.SpeechSynthesizer).Speak('symbols')

P.S. I forgot why I put the powershell calls inside a cmd file, mayby to avoid window popups

Aug 05 '24 03:08 gerhard-h

P.S. I forgot why I put the powershell calls inside a cmd file

It works fine without cmd files e.g. (cmd powershell.exe -NoProfile -NoLogo -NonInteractive -Command r#"Add-Type -AssemblyName System.Speech; (New-Object System.Speech.Synthesis.SpeechSynthesizer).Speak('symbols')"#)

Aug 05 '24 04:08 gerhard-h

@gerhard-h The first problem here is that if I want to switch layers using the layer switch command, I than have to use a multi action. The second problem is that this speaks using the system speech synth, not the screen reader. The third problem is cross compatibility, as I would have to deal with different commands on each platform.

Aug 05 '24 13:08 dragonwolfsp

The second problem is that this speaks using the system speech synth, not the screen reader.

I'm just curious as a non-blind person, and a person who does not use screen readers (although I rely on TTS software heavily for reading): what is the issue with having the speech synthesis not come from the screen reader?

Edit: I realized the one issue, or the major issue with that, if the screen reader user was using a braille display and wanted updates through braille. are there any other issues besides that?

Aug 05 '24 16:08 wis

I haven't seen any Rust crate that seems to offer the right functionality here. I did spot this though: https://github.com/khanshoaib3/CrossSpeak?tab=readme-ov-file

Maybe there can be a TCP integration

Aug 05 '24 22:08 jtroo

I haven't seen any Rust crate that seems to offer the right functionality here. I did spot this though: https://github.com/khanshoaib3/CrossSpeak?tab=readme-ov-file

I think the accesskit crate is the perfect fit here, for a cross-platform screen reader client crate.

Maybe there can be a TCP integration

Yeah, that was what I was thinking too, and I think it would be better if screen reader integration is done outside of core Kanata but through Kanata's extension mechanism but I don't think it's a bad idea to include the script in the repo with instructions on how to install it and use it.

For those who also want to look into doing this, I suggest using Python for the script and using this Python package, which has Python bindings for AccessKit.

Snippets from the Python script I use to run Kanata and extend it through the TCP inter-process communication mechanism: (click to expand spoiler)

import socket
import json
import queue
import subprocess

msg_que = queue.Queue()

def start_tcp_client(m_queue: queue.Queue):
    """
    Starts a TCP client to communicate with the server.

    Args:
        m_queue: The queue to send received messages for processing.
    """
    ip = socket.gethostbyname("localhost")
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as client_socket:
        # client_socket.settimeout(1.0)  # Set timeout for the accept call
        client_socket.connect((ip, 8089))
        handler = ServerMessageHandler(client_socket, m_queue)
        try:
            while not kanta_process_shutdown_flag.is_set():
                bytes = client_socket.recv(1024)
                if not bytes:
                    continue  # break
                text_lines = bytes.decode("utf-8").splitlines()
                for text in text_lines:
                    print("msg text:", text)
                    try:
                        srv_msg = json.loads(text)
                    except Exception as e:
                        print("TCP message json decode error:", e)
                        print("message text:", text, "message bytes:", bytes)
                        kanta_process_shutdown_flag.set()
                        # sys.exit(1)
                    print(f"TCP client received from TCP server: {srv_msg}")
                    handler.handle_message(srv_msg)
        except (KeyboardInterrupt, ConnectionResetError):
            print("\nProgram interrupted by user, exiting...")
            kanta_process_shutdown_flag.set()
            print("EXITING after C-c")
            sys.exit(0)

def start_exe_process():
    process = subprocess.Popen(
        sys.argv[1:],
        stdout=subprocess.PIPE,
    )
    while not kanta_process_shutdown_flag.is_set():
        output = process.stdout.readline()
        if output == "" and process.poll() is not None:
            break
        if output:
            line = output.decode("utf-8").strip()
            print(f".exe: {line}")
    process.terminate()
    try:
        process.wait(timeout=0.5)
    except subprocess.TimeoutExpired:
        process.kill()

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print(
            "Please provide the command-line to run Kanata, like so: python kanataIPC.py <command to run Kanata>\ne.g.: python kanataIPC.py kanata.exe -c mycfg.kbd...."
        )
        sys.exit(1)

    exe_thread = threading.Thread(target=start_exe_process, daemon=True)
    exe_thread.start()

    time.sleep(1.5)
    client_thread = threading.Thread(
        target=start_tcp_client, args=(msg_que,), daemon=True
    )
    client_thread.start()

class ServerMessageHandler:
    def __init__(self, client_socket, m_queue: queue.Queue):
        self.client_socket = client_socket
        self.m_queue = m_queue

    def handle_message(self, message: dict):
        print("message " * 5, ": ", message)
        match message:
            case {"LayerChange": {"new": new_layer}}:
                self.handle_layer_change(new_layer)
            case {"MessagePush": {"message": members}}:
                print("members " * 5, ": ", members)
                match members:
                    case ["adjust-monitor-brightness", delta]:
                        # self.adjust_monitor_brightness(delta)
                        pass
                    case ["go_workspace", idx]:
                        # self.go_to_workspace(idx)
                        pass
                    # ....

    def handle_layer_change(self, new_layer):
    	print("layer changed to", new_layer)

Aug 05 '24 23:08 wis

I am not familiar with the accessibility space. Accesskit seems to be intended for UI components from what I can see; how would it be used to send messages to a screen reader from a background process that isn't in focus? I could likely be missing something though.

Aug 06 '24 01:08 jtroo

@wis The issue with not speaking thru the screen reader is that if a long message is being outputted, there is know way to make it stop talking. I also believe that @jtroo is correct, you can't use ui messages when an application is not in focus, so using Accesskit would not work properly as most screen readers correctly ignore this behavior.

Aug 06 '24 14:08 dragonwolfsp

I see, I didn't realize programs with their Window not in the foreground can not announce messages through UI update messages. CrossSpeak, the C# library jtroo mentioned uses Tolk on Windows, Tolk is a C library that talks to screen readers directly, through their respective APIs, feature support varies from screen reader to another, and Tolk seems to only support Windows, and CrossSpeak seems to support Windows best, because looking at its code, it seems to just use TTS APIs on macOS and Linux and on Windows it uses Tolk to command the screen reader to announce something.

I corrected the errors from the script that I included in my previous comment (they were just snippets I copied and pasted from my Kanata setup, really) and I made the script use cytolk - a Python package that uses tolk to communicate with NVDA -- to make NVDA announce layer change messages. I tested it with NVDA and it worked.

Python script I tested with NVDA, it announces layer changes through NVDA (click to expand spoiler)

import sys
import socket
import json
import queue
import subprocess
import threading
import time
#from notifypy import Notify
from cytolk import tolk

msg_que = queue.Queue()

kanta_process_shutdown_flag = threading.Event()


class ServerMessageHandler:
    def __init__(self, client_socket, m_queue: queue.Queue):
        self.client_socket = client_socket
        self.m_queue = m_queue
        self.tolk = tolk.tolk()
        self.tolk.__enter__()
        #tolk.load(False)
        # detect the screenreader in use, in my case NVDA
        print(f"screenreader detected is {tolk.detect_screen_reader()}")

        # does this screenreader suport  speech and braille?
        tolk.speak("Kanata starter Python script started.", True)
        if tolk.has_speech():
            tolk.speak("this screenreader supports speech", True)
            print("this screenreader supports speech")
        if tolk.has_braille():
            print("this screenreader supports braille")

    def handle_message(self, message: dict):
        print("message " * 5, ": ", message)
        match message:
            case {"LayerChange": {"new": new_layer}}:
                self.handle_layer_change(new_layer)
            case {"MessagePush": {"message": members}}:
                print("members " * 5, ": ", members)
                match members:
                    case ["adjust-monitor-brightness", delta]:
                        # self.adjust_monitor_brightness(delta)
                        pass
                    case ["go_workspace", idx]:
                        # self.go_to_workspace(idx)
                        pass
                    # ....

    def handle_layer_change(self, new_layer):    
        print("layer changed to", new_layer)
        #notification = Notify()
        #notification.title = "Kanata Notifier"
        #notification.message = f"Layer changed to {new_layer}"
        #notification.send()
        tolk.speak(f"Layer changed to {new_layer}", True)
        

def start_tcp_client(m_queue: queue.Queue):
    """
    Starts a TCP client to communicate with the server.

    Args:
        m_queue: The queue to send received messages for processing.
    """
    ip = socket.gethostbyname("localhost")
    with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as client_socket:
        # client_socket.settimeout(1.0)  # Set timeout for the accept call
        client_socket.connect((ip, 8089))
        handler = ServerMessageHandler(client_socket, m_queue)
        try:
            while not kanta_process_shutdown_flag.is_set():
                bytes = client_socket.recv(1024)
                if not bytes:
                    continue  # break
                text_lines = bytes.decode("utf-8").splitlines()
                for text in text_lines:
                    print("msg text:", text)
                    try:
                        srv_msg = json.loads(text)
                    except Exception as e:
                        print("TCP message json decode error:", e)
                        print("message text:", text, "message bytes:", bytes)
                        kanta_process_shutdown_flag.set()
                        # sys.exit(1)
                    print(f"TCP client received from TCP server: {srv_msg}")
                    handler.handle_message(srv_msg)
        except (KeyboardInterrupt, ConnectionResetError):
            print("\nProgram interrupted by user, exiting...")
            kanta_process_shutdown_flag.set()
            print("EXITING after C-c")
            sys.exit(0)

def start_exe_process():
    process = subprocess.Popen(
        sys.argv[1:],
        stdout=subprocess.PIPE,
    )
    while not kanta_process_shutdown_flag.is_set():
        output = process.stdout.readline()
        if output == "" and process.poll() is not None:
            break
        if output:
            line = output.decode("utf-8").strip()
            print(f".exe: {line}")
    process.terminate()
    try:
        process.wait(timeout=0.5)
        print("process finished")
    except subprocess.TimeoutExpired:
        process.kill()

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print(
            "Please provide the command-line to run Kanata, like so: python kanataIPC.py <command to run Kanata>\ne.g.: python kanataIPC.py kanata.exe -c mycfg.kbd...."
        )
        sys.exit(1)

    exe_thread = threading.Thread(target=start_exe_process, daemon=True)
    exe_thread.start()

    time.sleep(1.5)
    client_thread = threading.Thread(
        target=start_tcp_client, args=(msg_que,), daemon=True
    )
    client_thread.start()
    
    
    exe_thread.join()
    client_thread.join()
    
    
    import signal

    # Define a flag to indicate when the threads should stop
    # stop_all_threads = threading.Event()

    def signal_handler(signal, frame):
        print("You pressed Ctrl+C!")
        # Set the flag to signal threads to stop
        kanta_process_shutdown_flag.set()

    # Register the signal handler for Ctrl+C
    signal.signal(signal.SIGINT, signal_handler)
    
    
    print("Press Ctrl+C to stop...")

How to run it and use it with Kanata:

Install uv, a popular, modern, and extremely fast Python package and project manager, written in Rust. You can install it with this command, copied from uv's docs:

powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

Create a new Python project with uv, this command creates a new project in a new directory/folder:

uv init py-kanata-screen-reader-announcer

Change the current directory in the terminal to the created directory for the project:

cd py-kanata-screen-reader-announcer

Install the cytolk package to the created Python project with uv:

uv add cytolk==0.1.13

Copy and paste the script above to a new file, name it for example start_Kanata_with_announcer.py
Run the script and Kanata with uv (the script starts Kanata for you):

uv run python start_Kanata_with_announcer.py "C:\path\to\kanata.exe" "C:\path\to\myconfig.kbd" -p 8089

Aug 06 '24 16:08 wis