AionUi icon indicating copy to clipboard operation
AionUi copied to clipboard

[Feature]: Voice input for Prompting

Open simonduz opened this issue 3 months ago • 3 comments

Feature Description

Voice Prompt Input via Microphone for AionUi

Summary: Add native support in AionUi to enter prompts via voice using the system microphone, alongside the existing keyboard-based input.

Key behavior: AionUi provides a microphone button in or near the prompt input field, plus a keyboard shortcut (e.g., Ctrl+Shift+M / Cmd+Shift+M). When activated, AionUi listens to the user’s microphone, clearly indicating that it is recording. When the user stops recording, AionUi transcribes the audio to text using a selectable speech-to-text (STT) option:- Local STT engine (on-device), or Cloud STT service (remote API).

  • The transcribed text is placed in the prompt input field.
  • The user can edit the text and then submit it as a normal prompt.

Platforms:

  • Linux
  • macOS
  • Windows

Core UX requirements:

Microphone icon and clear states:- Idle (ready to record).

  • Listening (actively recording).
  • Transcribing (processing audio).

Keyboard accessibility:- Focusable microphone button.

  • Configurable keyboard shortcut to start/stop recording.

Settings:- Enable/disable voice input.

  • Choose transcription mode: Local vs. Cloud.
  • Choose language (where supported).
  • Configure shortcut key.

Expected user flow (v1):

  1. User clicks the microphone icon or presses the shortcut.
  2. AionUi requests microphone permissions from the OS if needed.
  3. AionUi enters a “Listening” state and records audio.
  4. User clicks the microphone again or presses the shortcut to stop recording.
  5. AionUi transcribes the audio using the selected STT option (local or cloud).
  6. The transcription appears in the prompt input field.
  7. User optionally edits the text.
  8. User submits the prompt as usual.

Optional enhancements (nice-to-have, not required for v1):

  • “Press and hold to talk” behavior on the button or shortcut.
  • Toggle for auto-submit after transcription.
  • Spoken punctuation (“comma”, “period”, “new line”) mapped to actual punctuation/line breaks.
  • Clear indication and handling of a maximum recording duration (e.g., 1–2 minutes).

Problem Statement

Current behavior: AionUi only supports prompt input via keyboard. Users must type all prompts manually, regardless of length or complexity.

Issues this causes:

  1. Efficiency and workflow friction
  • Long, detailed prompts are slow and tiring to type.
  • Users who think faster than they type are constrained by keyboard speed.
  • Multitasking workflows are interrupted when users have to stop what they’re doing just to type, for example:- Referencing physical documents.
  • Moving between different apps and windows.
  • Working with a drawing tablet or other non-keyboard inputs.
  1. Accessibility limitations
  • Users with motor impairments, repetitive strain injuries, or other conditions may find extended typing difficult or painful.
  • Users with temporary limitations (e.g., hand or wrist injury, fatigue) face similar friction.
  • Without voice input, AionUi is less accessible in situations where speech would be a more viable or comfortable input method.
  1. Misalignment with user expectations for modern AI tools
  • Many AI and productivity tools now offer multiple input modes, including voice.
  • AionUi lacking voice input makes it less convenient and potentially less competitive, particularly for:- Brainstorming and “thinking out loud” workflows.
  • Rapid creation of long or nuanced prompts.
  • Users who prefer to talk instead of type.

Why this feature matters:

  • It would make AionUi faster and more comfortable to use for many users.
  • It would broaden access for users who cannot or prefer not to type long prompts.
  • It would bring AionUi closer to user expectations for an AI desktop application on Linux, macOS, and Windows.

Proposed Solution

Add voice input as an additional way to enter prompts in AionUi, integrated directly into the existing prompt area.

Core behavior: A microphone icon is added next to or inside the prompt input field. A keyboard shortcut (for example, Ctrl+Shift+M on Windows/Linux and Cmd+Shift+M on macOS) toggles recording on/off. When activated, AionUi listens to the user through the system microphone and clearly indicates that recording is in progress. When recording stops, AionUi transcribes the audio into text using a selectable STT mode:- Local STT engine (on-device). Cloud STT service (remote).

The resulting text appears in the prompt input field where the user can edit it and then submit it as usual.

Key elements of the solution:

  1. UI/UX

Add a microphone button:- States:- Idle: ready to start recording.

  • Listening: actively recording, visually highlighted (e.g., pulsing icon or waveform).
  • Transcribing: brief “Transcribing…” indication or spinner.

When in “Listening” state:- Clear visual feedback so users always know when the microphone is active.

When transcription is complete:- The transcribed text is inserted into the prompt input field.

  • Default behavior: replace current content.
  • Optionally allow an “append” behavior as a setting.
  1. Interaction flow (v1)

Start:- User clicks the microphone icon or presses the shortcut.

  • If needed, AionUi requests microphone permissions from the OS.

Record:- AionUi listens and shows the “Listening” state.

  • Optional: show a subtle timer or indicator if there is a maximum recording duration.

Stop:- User clicks the microphone icon again or presses the shortcut to stop recording.

  • Transcribe:- AionUi sends the audio to the selected STT option (local or cloud).
  • Shows a short “Transcribing…” state.

Edit and send:- AionUi inserts the transcription into the prompt field.

  • User can edit or refine the text.
  • User submits the prompt using the existing send action (button or Enter key).
  1. Settings and configuration
  • Voice Input section in AionUi settings to:- Enable/disable voice input.

  • Choose STT mode:- Local STT.

  • Cloud STT.

  • Configure cloud STT (if used):- API key / credentials.

  • Any required endpoint or region options.

  • Select language for transcription, where supported.

  • Configure the keyboard shortcut for start/stop recording.

  1. Error handling and feedback
  • If there is no microphone:- Show a clear message like “No microphone detected. Please connect a microphone and try again.”
  • If permissions are denied:- Inform the user and provide simple instructions to enable microphone access in OS settings.
  • If transcription fails:- Show a short error message (e.g., “Could not transcribe audio. Please try again.”).
  • Avoid leaving the UI stuck in a “Listening” or “Transcribing” state.
  1. Privacy and accessibility
  • Only start recording after explicit user action (button or shortcut) and OS permission.
  • Always show visible indication when recording is active.
  • Allow users to completely disable voice input in settings.
  • Ensure the microphone button is fully keyboard accessible and works well with screen readers (clear labels such as “Start voice input” / “Stop voice input”).

This solution introduces voice-based prompt entry in a way that is optional, configurable (local vs. cloud STT), and consistent with the existing AionUi prompt workflow on Linux, macOS, and Windows.

Feature Category

UI/UX Improvement

Additional Context

Target platforms:

  • Linux
  • macOS
  • Windows

Stack and implementation:

  • The underlying desktop stack used by AionUi is not assumed and is intentionally left unspecified.
  • The technical design and implementation details, including audio capture and integration with any speech-to-text (STT) provider, are left to the development team.

Speech-to-text options (both should be supported as configurable choices):

  • Local STT:- Audio is processed on the user’s machine.

  • Potential benefits:- Improved privacy (no audio leaves the device).

  • Possible offline usage.

  • Potential trade-offs:- Increased CPU usage on some systems.

  • Larger application footprint or dependencies.

  • Cloud STT:- Audio is sent to a remote service for transcription.

  • Potential benefits:- Simpler integration.

  • Often higher accuracy and multilingual support.

  • Involves third-party providers and data handling considerations.

  • May incur usage costs depending on the provider.

Configuration expectations:

  • In AionUi’s settings, users should be able to:- Enable or disable voice input entirely.

  • Choose transcription mode:- Local STT.

  • Cloud STT.

  • If cloud STT is selected:- Provide API key or credentials.

  • Configure endpoint/region if required by the provider.

  • Select transcription language where supported.

  • Configure or change the keyboard shortcut used to start/stop recording.

Privacy and security considerations:

  • Microphone access:- AionUi should only access the microphone after explicit user action (button click or shortcut) and OS-level permission.

  • There should always be a clear visual indicator when the microphone is active.

  • Data handling:- By default, raw audio should not be stored permanently.

  • Application logs should avoid including verbatim spoken content; logs should be limited to minimal metadata (e.g., timestamps, error codes).

  • If cloud STT is used:- Use secure connections (e.g., TLS) for audio and transcription requests.

  • Respect provider-specific options related to data retention and privacy.

  • Clearly indicate in settings that audio is sent to a third party when this mode is enabled.

Accessibility:

  • Voice input itself is intended to improve accessibility for users who have difficulty typing.
  • The UI should:- Provide clear visual states (idle, listening, transcribing) that don’t rely solely on color.
  • Make the microphone control accessible via keyboard (Tab focus, Space/Enter activation).
  • Provide appropriate labels and announcements for screen readers (e.g., “Start voice input,” “Stop voice input,” “Listening,” “Transcribing,” “Transcription complete”).

Example user stories:

  • “As an AionUi user who types slowly, I want to speak my prompts so I can interact with the AI faster.”
  • “As a user with wrist pain, I want to use voice input for most of my prompts so I can reduce typing strain.”
  • “As a power user, I want a keyboard shortcut to start and stop voice input so I can dictate prompts without needing to use the mouse.”

Priority:

  • Proposed priority: Medium
  • Rationale:- Significant accessibility benefits.
  • Reduces friction for long or complex prompts.
  • Aligns AionUi with common expectations for modern AI desktop applications that support multiple input modes, including voice.

simonduz avatar Nov 21 '25 13:11 simonduz

🤖 Hi @simonduz, I've received your request, and I'm working on it now! You can track my progress in the logs for more details.

aionui[bot] avatar Nov 21 '25 13:11 aionui[bot]

🤖 I'm sorry @simonduz, but I was unable to process your request. Please see the logs for more details.

aionui[bot] avatar Nov 21 '25 13:11 aionui[bot]

Feature Description

Voice Prompt Input via Microphone for AionUi通过麦克风为 AionUi 输入语音提示

Summary:  概括: Add native support in AionUi to enter prompts via voice using the system microphone, alongside the existing keyboard-based input.在 AionUi 中添加原生支持,允许使用系统麦克风通过语音输入提示,与现有的键盘输入方式并行。

Key behavior:  关键行为: AionUi provides a microphone button in or near the prompt input field, plus a keyboard shortcut (e.g., Ctrl+Shift+M / Cmd+Shift+M).AionUi 在提示输入字段内或附近提供了一个麦克风按钮,以及一个键盘快捷键(例如,Ctrl+Shift+M / Cmd+Shift+M)。 When activated, AionUi listens to the user’s microphone, clearly indicating that it is recording.激活后,AionUi 会监听用户的麦克风,清楚地表明它正在录音。 When the user stops recording, AionUi transcribes the audio to text using a selectable speech-to-text (STT) option:- Local STT engine (on-device), or当用户停止录音时,AionUi 会使用可选择的语音转文本 (STT) 选项将音频转录为文本:- 本地 STT 引擎(设备上),或 Cloud STT service (remote API).云端 STT 服务(远程 API)。

  • The transcribed text is placed in the prompt input field.转录的文本将放置在提示输入字段中。
  • The user can edit the text and then submit it as a normal prompt.用户可以编辑文本,然后像普通提示一样提交。

Platforms:  平台:

  • Linux
  • macOS
  • Windows  视窗

Core UX requirements:  核心用户体验要求:

Microphone icon and clear states:- Idle (ready to record).麦克风图标和清晰状态:- 空闲(准备录音)。

  • Listening (actively recording).正在监听(主动录音)。
  • Transcribing (processing audio).转录(音频处理)。

Keyboard accessibility:- Focusable microphone button.键盘辅助功能:- 可聚焦的麦克风按钮。

  • Configurable keyboard shortcut to start/stop recording.可配置的键盘快捷键用于开始/停止录制。

Settings:- Enable/disable voice input.设置:- 启用/禁用语音输入。

  • Choose transcription mode: Local vs. Cloud.选择转录模式:本地转录或云端转录。
  • Choose language (where supported).选择语言(如适用)。
  • Configure shortcut key.  配置快捷键。

Expected user flow (v1):预期用户流程(v1):

  1. User clicks the microphone icon or presses the shortcut.用户点击麦克风图标或按下快捷键。
  2. AionUi requests microphone permissions from the OS if needed.AionUi 会在需要时向操作系统请求麦克风权限。
  3. AionUi enters a “Listening” state and records audio.AionUi 进入“监听”状态并录制音频。
  4. User clicks the microphone again or presses the shortcut to stop recording.用户再次点击麦克风或按下快捷键即可停止录音。
  5. AionUi transcribes the audio using the selected STT option (local or cloud).AionUi 使用选定的 STT 选项(本地或云端)转录音频。
  6. The transcription appears in the prompt input field.转录内容将显示在提示输入框中。
  7. User optionally edits the text.用户可选择编辑文本。
  8. User submits the prompt as usual.用户照常提交提示。

Optional enhancements (nice-to-have, not required for v1):可选增强功能(锦上添花,v1 版本并非必需):

  • “Press and hold to talk” behavior on the button or shortcut.按钮或快捷键的“按住说话”功能。
  • Toggle for auto-submit after transcription.启用转录后自动提交功能。
  • Spoken punctuation (“comma”, “period”, “new line”) mapped to actual punctuation/line breaks.口语标点符号(“逗号”、“句号”、“换行符”)与实际标点符号/换行符对应。
  • Clear indication and handling of a maximum recording duration (e.g., 1–2 minutes).明确指示和处理最大录制时长(例如,1-2 分钟)。

Problem Statement

Current behavior:  当前行为: AionUi only supports prompt input via keyboard.AionUi 仅支持通过键盘输入提示。 Users must type all prompts manually, regardless of length or complexity.用户必须手动输入所有提示信息,无论提示信息的长度或复杂程度如何。

Issues this causes:  由此引发的问题:

  1. Efficiency and workflow friction效率和工作流程摩擦
  • Long, detailed prompts are slow and tiring to type.冗长、详细的提示语打字既慢又累。
  • Users who think faster than they type are constrained by keyboard speed.思维速度快于打字速度的用户会受到键盘速度的限制。
  • Multitasking workflows are interrupted when users have to stop what they’re doing just to type, for example:- Referencing physical documents.当用户不得不停止正在做的事情来输入文字时,多任务工作流程就会被打断,例如:- 查阅纸质文档。
  • Moving between different apps and windows.在不同的应用程序和窗口之间切换。
  • Working with a drawing tablet or other non-keyboard inputs.使用绘图板或其他非键盘输入方式进行工作。
  1. Accessibility limitations无障碍限制
  • Users with motor impairments, repetitive strain injuries, or other conditions may find extended typing difficult or painful.患有运动障碍、重复性劳损或其他疾病的用户可能会发现长时间打字很困难或很痛苦。
  • Users with temporary limitations (e.g., hand or wrist injury, fatigue) face similar friction.暂时行动不便的用户(例如手或腕部受伤、疲劳)也会面临类似的困扰。
  • Without voice input, AionUi is less accessible in situations where speech would be a more viable or comfortable input method.如果没有语音输入,AionUi 在语音输入更可行或更舒适的输入方式的情况下就不太方便使用。
  1. Misalignment with user expectations for modern AI tools与用户对现代人工智能工具的期望不符
  • Many AI and productivity tools now offer multiple input modes, including voice.许多人工智能和生产力工具现在都提供多种输入模式,包括语音输入。
  • AionUi lacking voice input makes it less convenient and potentially less competitive, particularly for:- Brainstorming and “thinking out loud” workflows.AionUi 缺少语音输入功能,这使其不太方便,也可能不太具有竞争力,尤其是在以下方面:- 头脑风暴和“大声思考”工作流程。
  • Rapid creation of long or nuanced prompts.快速创建篇幅较长或措辞细致的提示语。
  • Users who prefer to talk instead of type.喜欢语音交流而不是文字输入的用户。

Why this feature matters:此功能的重要性:

  • It would make AionUi faster and more comfortable to use for many users.这将使 AionUi 对许多用户来说使用起来更快、更舒适。
  • It would broaden access for users who cannot or prefer not to type long prompts.这将扩大那些无法或不愿输入冗长提示的用户的访问权限。
  • It would bring AionUi closer to user expectations for an AI desktop application on Linux, macOS, and Windows.这将使 AionUi 更接近用户对 Linux、macOS 和 Windows 平台上的 AI 桌面应用程序的期望。

Proposed Solution

Add voice input as an additional way to enter prompts in AionUi, integrated directly into the existing prompt area.在 AionUi 中添加语音输入作为输入提示的另一种方式,直接集成到现有的提示区域。

Core behavior:  核心行为: A microphone icon is added next to or inside the prompt input field.在提示输入框旁边或里面会添加一个麦克风图标。 A keyboard shortcut (for example, Ctrl+Shift+M on Windows/Linux and Cmd+Shift+M on macOS) toggles recording on/off.键盘快捷键(例如,Windows/Linux 上的 Ctrl+Shift+M 和 macOS 上的 Cmd+Shift+M)可以切换录制的开/关状态。 When activated, AionUi listens to the user through the system microphone and clearly indicates that recording is in progress.激活后,AionUi 会通过系统麦克风监听用户,并清晰地指示录音正在进行中。 When recording stops, AionUi transcribes the audio into text using a selectable STT mode:- Local STT engine (on-device).录音停止时,AionUi 使用可选择的 STT 模式将音频转录为文本:- 本地 STT 引擎(设备上)。 Cloud STT service (remote).云端 STT 服务(远程)。

The resulting text appears in the prompt input field where the user can edit it and then submit it as usual.生成的文本将显示在提示输入字段中,用户可以对其进行编辑,然后像往常一样提交。

Key elements of the solution:解决方案的关键要素:

  1. UI/UX  用户界面/用户体验

Add a microphone button:- States:- Idle: ready to start recording.添加麦克风按钮:- 状态:- 空闲:准备开始录音。

  • Listening: actively recording, visually highlighted (e.g., pulsing icon or waveform).监听:正在积极录音,以视觉方式突出显示(例如,脉冲图标或波形)。
  • Transcribing: brief “Transcribing…” indication or spinner.转录:简短的“正在转录……”指示或旋转图标。

When in “Listening” state:- Clear visual feedback so users always know when the microphone is active.在“聆听”状态下:- 清晰的视觉反馈,以便用户始终知道麦克风何时处于活动状态。

When transcription is complete:- The transcribed text is inserted into the prompt input field.转录完成后:- 将转录的文本插入到提示输入字段中。

  • Default behavior: replace current content.默认行为:替换当前内容。
  • Optionally allow an “append” behavior as a setting.可选择启用“追加”行为(作为一项设置)。
  1. Interaction flow (v1)  交互流程(v1)

Start:- User clicks the microphone icon or presses the shortcut.开始:- 用户点击麦克风图标或按下快捷键。

  • If needed, AionUi requests microphone permissions from the OS.如有需要,AionUi 会向操作系统请求麦克风权限。

Record:- AionUi listens and shows the “Listening” state.记录:- AionUi 监听并显示“正在监听”状态。

  • Optional: show a subtle timer or indicator if there is a maximum recording duration.可选:如果存在最大录制时长,则显示一个不显眼的计时器或指示器。

Stop:- User clicks the microphone icon again or presses the shortcut to stop recording.停止:用户再次点击麦克风图标或按下快捷键即可停止录音。

  • Transcribe:- AionUi sends the audio to the selected STT option (local or cloud).转录:- AionUi 将音频发送到选定的 STT 选项(本地或云端)。
  • Shows a short “Transcribing…” state.显示简短的“正在转录…”状态。

Edit and send:- AionUi inserts the transcription into the prompt field.编辑并发送:- AionUi 将转录内容插入到提示字段中。

  • User can edit or refine the text.用户可以编辑或修改文本。
  • User submits the prompt using the existing send action (button or Enter key).用户通过现有的发送操作(按钮或回车键)提交提示。
  1. Settings and configuration设置和配置
  • Voice Input section in AionUi settings to:- Enable/disable voice input.在 AionUi 设置中的“语音输入”部分,可以:- 启用/禁用语音输入。
  • Choose STT mode:- Local STT.选择 STT 模式:- 本地 STT。
  • Cloud STT.  云 STT。
  • Configure cloud STT (if used):- API key / credentials.配置云 STT(如果使用):- API 密钥/凭证。
  • Any required endpoint or region options.任何必需的端点或区域选项。
  • Select language for transcription, where supported.选择要转录的语言(如果支持)。
  • Configure the keyboard shortcut for start/stop recording.配置开始/停止录制的键盘快捷键。
  1. Error handling and feedback错误处理和反馈
  • If there is no microphone:- Show a clear message like “No microphone detected. Please connect a microphone and try again.”如果没有麦克风:- 显示清晰的消息,例如“未检测到麦克风。请连接麦克风并重试。”
  • If permissions are denied:- Inform the user and provide simple instructions to enable microphone access in OS settings.如果权限被拒绝:- 通知用户并提供简单的说明,以便在操作系统设置中启用麦克风访问权限。
  • If transcription fails:- Show a short error message (e.g., “Could not transcribe audio. Please try again.”).如果转录失败:- 显示简短的错误信息(例如,“无法转录音频。请重试。”)。
  • Avoid leaving the UI stuck in a “Listening” or “Transcribing” state.避免让用户界面一直处于“正在监听”或“正在转录”状态。
  1. Privacy and accessibility隐私和无障碍访问
  • Only start recording after explicit user action (button or shortcut) and OS permission.只有在用户明确操作(按钮或快捷键)并获得操作系统许可后才开始录制。
  • Always show visible indication when recording is active.录制进行时,务必显示可见指示。
  • Allow users to completely disable voice input in settings.允许用户在设置中完全禁用语音输入。
  • Ensure the microphone button is fully keyboard accessible and works well with screen readers (clear labels such as “Start voice input” / “Stop voice input”).确保麦克风按钮完全可以通过键盘操作,并且能够与屏幕阅读器良好配合(例如“开始语音输入”/“停止语音输入”等清晰的标签)。

This solution introduces voice-based prompt entry in a way that is optional, configurable (local vs. cloud STT), and consistent with the existing AionUi prompt workflow on Linux, macOS, and Windows.该解决方案引入了基于语音的提示输入,它是可选的、可配置的(本地与云端 STT),并且与 Linux、macOS 和 Windows 上现有的 AionUi 提示工作流程一致。

Feature Category

UI/UX Improvement  UI/UX 改进

Additional Context

Target platforms:  目标平台:

  • Linux
  • macOS
  • Windows  视窗

Stack and implementation:技术栈及实现:

  • The underlying desktop stack used by AionUi is not assumed and is intentionally left unspecified.AionUi 使用的底层桌面堆栈未被假定,并且有意未作指定。
  • The technical design and implementation details, including audio capture and integration with any speech-to-text (STT) provider, are left to the development team.技术设计和实现细节,包括音频采集和与任何语音转文本 (STT) 提供商的集成,都留给开发团队负责。

Speech-to-text options (both should be supported as configurable choices):语音转文本选项(两者都应作为可配置选项提供支持):

  • Local STT:- Audio is processed on the user’s machine.本地 STT:音频在用户计算机上进行处理。
  • Potential benefits:- Improved privacy (no audio leaves the device).潜在好处:- 提高隐私性(音频不会离开设备)。
  • Possible offline usage.  可离线使用。
  • Potential trade-offs:- Increased CPU usage on some systems.潜在的权衡取舍:- 在某些系统上 CPU 使用率增加。
  • Larger application footprint or dependencies.应用程序占用空间或依赖项较大。
  • Cloud STT:- Audio is sent to a remote service for transcription.云端 STT:音频被发送到远程服务进行转录。
  • Potential benefits:- Simpler integration.潜在优势:- 更简单的集成。
  • Often higher accuracy and multilingual support.通常准确率更高,并支持多语言。
  • Involves third-party providers and data handling considerations.涉及第三方供应商和数据处理方面的考虑。
  • May incur usage costs depending on the provider.根据服务提供商的不同,可能会产生使用费用。

Configuration expectations:配置预期:

  • In AionUi’s settings, users should be able to:- Enable or disable voice input entirely.在 AionUi 的设置中,用户应该能够:- 完全启用或禁用语音输入。
  • Choose transcription mode:- Local STT.选择转录模式:- 本地 STT。
  • Cloud STT.  云 STT。
  • If cloud STT is selected:- Provide API key or credentials.如果选择云端 STT:- 提供 API 密钥或凭证。
  • Configure endpoint/region if required by the provider.根据提供商的要求,配置端点/区域。
  • Select transcription language where supported.请选择支持的转录语言。
  • Configure or change the keyboard shortcut used to start/stop recording.配置或更改用于开始/停止录制的键盘快捷键。

Privacy and security considerations:隐私和安全方面的考虑:

  • Microphone access:- AionUi should only access the microphone after explicit user action (button click or shortcut) and OS-level permission.麦克风访问权限:- AionUi 应该只在用户明确操作(按钮点击或快捷键)和操作系统级别权限之后才能访问麦克风。
  • There should always be a clear visual indicator when the microphone is active.麦克风处于工作状态时,应该始终有清晰的视觉指示。
  • Data handling:- By default, raw audio should not be stored permanently.数据处理:- 默认情况下,原始音频不应永久存储。
  • Application logs should avoid including verbatim spoken content; logs should be limited to minimal metadata (e.g., timestamps, error codes).应用程序日志应避免包含逐字逐句的语音内容;日志应仅限于最少的元数据(例如,时间戳、错误代码)。
  • If cloud STT is used:- Use secure connections (e.g., TLS) for audio and transcription requests.如果使用云 STT:- 对音频和转录请求使用安全连接(例如 TLS)。
  • Respect provider-specific options related to data retention and privacy.尊重服务提供商针对数据保留和隐私方面的具体选项。
  • Clearly indicate in settings that audio is sent to a third party when this mode is enabled.请在设置中明确指示,启用此模式时,音频将发送给第三方。

Accessibility:  无障碍设施:

  • Voice input itself is intended to improve accessibility for users who have difficulty typing.语音输入本身旨在提高打字有困难的用户的使用体验。
  • The UI should:- Provide clear visual states (idle, listening, transcribing) that don’t rely solely on color.用户界面应:- 提供清晰的视觉状态(空闲、聆听、转录),而不仅仅依赖于颜色。
  • Make the microphone control accessible via keyboard (Tab focus, Space/Enter activation).使麦克风控制可通过键盘访问(Tab 键聚焦,空格键/回车键激活)。
  • Provide appropriate labels and announcements for screen readers (e.g., “Start voice input,” “Stop voice input,” “Listening,” “Transcribing,” “Transcription complete”).为屏幕阅读器提供适当的标签和提示(例如,“开始语音输入”、“停止语音输入”、“正在收听”、“正在转录”、“转录完成”)。

Example user stories:  用户故事示例:

  • “As an AionUi user who types slowly, I want to speak my prompts so I can interact with the AI faster.”“作为一名打字速度较慢的 AionUi 用户,我希望能够语音输入提示,以便更快地与 AI 进行交互。”
  • “As a user with wrist pain, I want to use voice input for most of my prompts so I can reduce typing strain.”“我手腕疼痛,所以想用语音输入来接收大部分提示,这样可以减轻打字带来的疲劳。”
  • “As a power user, I want a keyboard shortcut to start and stop voice input so I can dictate prompts without needing to use the mouse.”“作为一名高级用户,我希望有一个键盘快捷键可以启动和停止语音输入,这样我就可以直接口述提示信息而无需使用鼠标。”

Priority:  优先事项:

  • Proposed priority: Medium建议优先级:中等
  • Rationale:- Significant accessibility benefits.理由:- 显著的无障碍优势。
  • Reduces friction for long or complex prompts.减少冗长或复杂提示的阻力。
  • Aligns AionUi with common expectations for modern AI desktop applications that support multiple input modes, including voice.AionUi 符合现代 AI 桌面应用程序的普遍期望,支持多种输入模式,包括语音输入。

@simonduz Oh, this looks like a complete and great idea. If you can implement this feature and submit a PR to us, becoming one of our developers, we would be very happy and look forward to your joining!

kuishou68 avatar Nov 23 '25 06:11 kuishou68

I'm interested to contribute. Can I take this one? @kuishou68

Marvae avatar Jan 20 '26 15:01 Marvae