Speech Note 4.8.0 Beta 3
If you want to test the upcoming release, Speech Note 4.7.0 Beta is available in "flathub-beta" repository.
This version is perfectly usable, but may contain more bugs.
To enable "flathub-beta" in your system follow this instruction or simply do the following:
flatpak remote-add --if-not-exists flathub-beta https://flathub.org/beta-repo/flathub-beta.flatpakrepo
Changes between 4.7.1 and 4.8.0 Beta 3
- User Interface
- Speech Note has been translated into Arabic, Catalan, Spanish and French-Canadian languages.
- Speech to Text
- New CrisperWhisper model for FasterWhisper engine. CrisperWhisper is designed for fast, precise, and verbatim speech recognition with accurate word-level timestamps. Unlike the original Whisper, which tends to omit disfluencies and follows more of a intended transcription style, CrisperWhisper aims to transcribe every spoken word exactly as it is, including fillers, pauses, stutters and false starts. CrisperWhisper model is enabled only for English and German languages.
- New KBLab Whisper models for Swedish. The National Library of Sweden has released fine-tuned STT models trained on its library collections. The models have significantly improved accuracy compared to regular Whisper models.
- Option to pause listening while processing. This option can be useful when Listening mode is Always on. By default, listening continues even when a piece of audio data is being processed. Using this option, you can temporarily pause listening for the duration of processing.
- Option to play an audible tone when starting and stopping listening
- Text to Speech
- Kokoro TTS engine. Kokoro is a compact yet powerful open-source multilingual TTS engine. Despite its modest size (trained on less than 100 hours of audio), it delivers impressive results. Kokoro voices are enabled for: English, Chinese, Japanese, Hindi, Italian, French, Spanish and Portuguese.
- F5-TTS engine. The F5-TTS provides exceptional voice cloning capabilities. The currently enabled model works with English and Chinese languages. F5-TTS works best with CUDA acceleration. CPU only processing can be very slow.
- Parler-TTS engine. Parler-TTS can generate high-quality, natural sounding speech in the style of a given speaker (gender, pitch, speaking style, etc). The speaker's characteristics are defined by a text description (prompt). To use Parler-TTS models, you need to configure a Text voice profile. This can be done in the Voice profiles menu. Parler-TTS primarily supports English, but a multilingual model for French, Spanish, Portuguese, Polish, German, Dutch and Italian is also included. Currently, the multilingual model provides rather poor quality and not entirely usable speech. Parler-TTS works best with CUDA acceleration. CPU only processing can be very slow.
- S.A.M. TTS engine. S.A.M. is a small speech synthesizer designed for the Commodore 64. It features a robotic voice that evokes a strong sense of nostalgia. The S.A.M. voice is available in English only.
- Normalize audio setting option. Use this option to enable/disable audio volume normalization. The volume is normalized independently for each sentence, which can lead to unstable volume levels in different sentences. Disable this option if you observe this problem.
- New Piper voices for Dutch, Finnish, German and Luxembourgish
- New RHVoice voice for Spanish
- Accessibility (Wayland)
- Support for Insert into active window under Wayland. Using
start-listening-active-windoworstart-listening-translate-active-windowactions you can directly insert the decoded text into any window which is currently in focus. This feature worked under X11 only, but now it is also supported under Wayland. For actions to work, ydotool daemon must be installed and running. If you are using Flatpak, also make sure that the application has permission to accessydotooldaemon socket file. - Support for Global keyboard shortcuts under Wayland. Global keyboard shortcuts allow you to start or stop listening and reading using keyboard even when the application is not active (e.g. minimized or in the background). Until now, this capability was only available under X11. Now integration with XDG Desktop Portal has been added, making global keyboard shortcuts possible also under Wayland. For shortcuts to work, your desktop environment has to support GlobalShortcuts interface on XDG Desktop Portal service. Right now,
GlobalShortcutsis only supported in KDE Plasma.
- Support for Insert into active window under Wayland. Using
- Flatpak
- Python support enabled in Tiny and ARM packages. Python libraries are not included in Tiny or ARM packages, but using the Location of Python libraries option, you can set an external directory that contains the libraries. Make sure that the Flatpak application has permissions to access this directory.
For me, 4.8.0 Beta 1 breaks global keyboard shortcuts on X11 (Ubuntu 24.04/gnome) - even with ydotoold running.
@phirsch
Thanks for reporting!
X11
For me, 4.8.0 Beta 1 breaks global keyboard shortcuts on X11
Just tested on Ubuntu 24.04 GNOME X11 session and "Global Shortcuts" as well as "Insert into active window" worked without any problem. On X11, ydotool is not used by default, instead "Insert into active window" should work out-of-the-box without any additional program.
Could you attach the Speech Note log after running with the --verbose option?
flatpak run net.mkiol.SpeechNote --verbose
Wayland
In GNOME Wayland session (the default in Ubuntu 24.04), you can't use "Global Shortcuts" because GNOME implemented support for this just 2 days ago! :) "Global Shortcuts" on Wayland currently only work in the latest version of KDE Plasma.
"Insert into active window" on Wayland requires ydotoold, but unfortunately Ubuntu 24.04 provides too old version. For Speech Note to work, you need to install ydotoold from sources or use binaries from Github.
In addition, ydotoold must be run with the GID and UID of a regular user, otherwise Speech Note will not be able to connect due to lack of permissions.
sudo ./ydotoold-release-ubuntu-latest --socket-own="$(id -u):$(id -g)"
Finally, permissions to /tmp must be granted to the Flatpak package because ydotool socket file is /tmp/.ydotool_socket. For example, this can be done with the FlatSeal app:
Hi, I successfully tested the following:
Text to Speech
- [x] S.A.M. TTS engine
Accessibility (Wayland)
- [x] Support for Insert into active window under Wayland.
- [x] Support for Global keyboard shortcuts under Wayland.
Great Job!
FYI, tested on OpenSuse Tumbleweed 20250302 KDE Plasma: 6.3.2 Kernel: 6.13.5-1 Graphics: Wayland
@nnbyte Cool! Thanks for the feedback.
Another observation in this context: In 4.8.0 Beta 1, pressing a modifier key like 'Ctrl' on its own immediately aborts the 'shortcut recording mode' (note: that's just the XCB_KEY_PRESS(2) event on it's own, before releasing the key again).
Sorry for the delay - only just managed to test this again. My shortcut still works with 4.7.0 and fails with 4.8.0 Beta 1.
This snippet of the log looks like it might be relevant:
4.7.0:
[D] 05:07:32.134254881.134 0x782e3df9cd00 () - Event | XCB_KEY_PRESS(2) | sequence: 1799
[D] 05:07:32.134381082.134 0x782e3df9cd00 () - Event | XCB_XKB_* event(85) | sequence: 1799
[D] 05:07:32.134678107.134 0x782e3df9cd00 () - Event | XCB_PROPERTY_NOTIFY(28) | sequence: 1801
[D] 05:07:32.229723613.229 0x782e3df9cd00 () - Event | XCB_KEY_PRESS(2) | sequence: 1805
[D] 05:07:32.229797965.229 0x782e3df9cd00 () - Event | XCB_XKB_* event(85) | sequence: 1805
[D] 05:07:32.254977133.254 0x782e3df9cd00 () - Event | XCB_KEY_PRESS(2) | sequence: 1806
[D] 05:07:32.255023559.255 0x782e3df9cd00 () - Event | XCB_XKB_* event(85) | sequence: 1806
[D] 05:07:32.979062930.979 0x782e3df9cd00 () - Event | XCB_FOCUS_OUT(10) | sequence: 1807
[D] 05:07:32.979096661.979 0x782e3df9cd00 () - Event | XCB_KEY_PRESS(2) | sequence: 1807
[D] 05:07:32.979132194.979 0x782e3df9cd00 () - hot key activated: start-reading-clipboard
[D] 05:07:32.979142827.979 0x782e3df9cd00 () - executing action: start-reading-clipboard extra = ""
[D] 05:07:32.979805037.979 0x782e3df9cd00 () - tts play speech
[D] 05:07:32.980543624.980 0x782e3df9cd00 () - choosing model for id: "en_piper_us_lessac_high" "en"
[D] 05:07:32.980604936.980 0x782e3df9cd00 () - restart tts engine config: "lang=en, speaker=, model-files=[model-path=/home/user/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models/en_piper_us_lessac_high, vocoder-path=, diacritizer=, hub-path=/home/user/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote/speech-models], speaker=, ref_voice_file=, text-format=raw, sync_subs=on-fit-only-if-longer, tag_mode=support, options=, lang_code=, share-dir=/app/share, cache-dir=/home/user/.var/app/net.mkiol.SpeechNote/cache/net.mkiol/dsnote, data-dir=, speech-speed=12, split-into-sentences=1, use-engine-speed-control=1, use-gpu=0, gpu-device=[id=-1, api=opencl, name=, platform-name=], audio-format=ogg-opus"
[D] 05:07:32.980607645.980 0x782e3df9cd00 () - new tts engine required
[D] 05:07:32.980613335.980 0x782e3df9cd00 start:235 - tts start
[D] 05:07:32.980653383.980 0x782e3df9cd00 start:245 - tts start completed
[D] 05:07:32.980658030.980 0x782e3df9cd00 encode_speech:329 - tts encode speech
[D] 05:07:32.980824891.980 0x782c28400680 process:923 - tts prosessing started
4.8.1 Beta:
[D] 05:07:51.953262818.953 0x7b68f7782d00 () - Event | XCB_KEY_PRESS(2) | sequence: 1854
[D] 05:07:51.953354418.953 0x7b68f7782d00 () - Event | XCB_XKB_* event(85) | sequence: 1854
[D] 05:07:52.026767774.26 0x7b68f7782d00 () - Event | XCB_KEY_PRESS(2) | sequence: 1855
[D] 05:07:52.026826592.26 0x7b68f7782d00 () - Event | XCB_XKB_* event(85) | sequence: 1855
[D] 05:07:52.069326510.69 0x7b68f7782d00 () - Event | XCB_KEY_PRESS(2) | sequence: 1856
[D] 05:07:52.069422382.69 0x7b68f7782d00 () - Event | XCB_XKB_* event(85) | sequence: 1856
[D] 05:07:52.460911365.460 0x7b68f7782d00 () - Event | XCB_KEY_PRESS(2) | sequence: 1857
[D] 05:07:52.461047689.461 0x7b68f7782d00 () - createNewSequences(QKeyEvent(ShortcutOverride, Key_E, ShiftModifier|ControlModifier|AltModifier, text="\u0005"), ignoredModifiers=QFlags<Qt::KeyboardModifier>(NoModifier)), possibleKeys=(
[D] 05:07:52.461061403.461 0x7b68f7782d00 () - QKeySequence("Ctrl+Alt+Shift+E")
[D] 05:07:52.461065553.461 0x7b68f7782d00 () - )
[D] 05:07:52.461071883.461 0x7b68f7782d00 () - Possible shortcut key sequences: QVector(QKeySequence("Ctrl+Alt+Shift+E"))
[D] 05:07:52.461078482.461 0x7b68f7782d00 () - Returning shortcut match == 0
[D] 05:07:52.461085129.461 0x7b68f7782d00 () - QShortcutMap::nextState(QKeyEvent(ShortcutOverride, Key_E, ShiftModifier|ControlModifier|AltModifier, text="\u0005")) = 0
[D] 05:07:52.608621020.608 0x7b68f7782d00 () - Event | XCB_KEY_RELEASE(3) | sequence: 1858
[D] 05:07:52.980278902.980 0x7b68f7782d00 () - Event | XCB_KEY_RELEASE(3) | sequence: 1858
[D] 05:07:52.980332142.980 0x7b68f7782d00 () - Event | XCB_XKB_* event(85) | sequence: 1858
[D] 05:07:52.990208969.990 0x7b68f7782d00 () - Event | XCB_KEY_RELEASE(3) | sequence: 1858
[D] 05:07:52.990240721.990 0x7b68f7782d00 () - Event | XCB_XKB_* event(85) | sequence: 1858
[D] 05:07:53.004393929.4 0x7b68f7782d00 () - Event | XCB_KEY_RELEASE(3) | sequence: 1858
[D] 05:07:53.004455400.4 0x7b68f7782d00 () - Event | XCB_XKB_* event(85) | sequence: 1858
[D] 05:07:53.493337540.493 0x7b68f7782d00 () - Event | XInput Event(XCB_INPUT_MOTION) | sequence: 1858
[D] 05:07:53.493369740.493 0x7b68f7782d00 () - XI2 mouse motion 855,515, time 433579051, source MouseEventNotSynthesized
[D] 05:07:53.493417995.493 0x7b68f7782d00 () - QQuickWindow::handleMouseEvent() QEvent::MouseMove QPointF(855,515) Qt::NoButton QFlags<Qt::MouseButton>(NoButton)
Although 4.8.1 Beta does detect QKeySequence("Ctrl+Alt+Shift+E"), the action never gets triggered for some reason.
@phirsch Sorry, but I can't reproduce this problem.
Also, I do not see these log lines Event | XCB_. They are not coming from Speech Note, at least not directly. Can you say something more about your system? Do you have any custom environment variables, etc.?
The whole "4.8.1 Beta" log does not look like the Speech Note log. I have no idea what it is.
- If you are using Flatpak, also make sure that the application has permission to access
ydotooldaemon socket file.
How do we do this?
I also read in one of the other issues that ydotool needs to have elevated/root privileges. Maybe some quick instructions on how to enable ydotool to function fully so that it works with this app can be shared?
@Kentoseth
I also read in one of the other issues that ydotool needs to have elevated/root privileges.
I'm not an expert, but it doesn't have to be always run with root privileges. It might depend on distro and version of ydotool daemon.
- In Arch, the ydotool package provides regular systemd user service. It can be run without sudo, and the daemon creates a socket at
/run/user/$UID/.ydotool_socketwith user uid/gid. To use it with Speech Note, just grant access to/run/user/$UID(in FlatSeal). - In Fedora, you need root to start systemd ydotool service. Socket is created in
/tmp/.ydotool_socketbut with permissions for everyone. To use it with Speech Note, grant access to/tmp(in FlatSeal).
In Arch, the ydotool package provides regular systemd user service.
The Arch package actually provide a uinput rule to allow access to /dev/uinput without elevated privileges. It also comes with a systemd service that allows you to use systemd to manage ydotoold as a service.
First, thank you for making this! Great app.
Text to speech in the main window and to clipboard work, but text to active window is not working for me. I have ydotool installed and the configuration seems to be correct - I don't get the error message in 'Settings -> Advanced' - but there is no output. In the log this line appears repeatedly
key_from_character:197 - xkb_keymap not initialized
I am guessing something about my X keyboard configuration is not right, but I don't know enough about it to work out what is going wrong. setxkbmap -query -verbose gives the following:
keycodes: evdev+aliases(qwerty)
types: complete
compat: complete
symbols: pc+us+inet(evdev)
geometry: pc(pc105)
rules: evdev
model: pc105
layout: us
System details:
Operating System: openSUSE Leap 15.6 KDE Plasma Version: 6.3.3 KDE Frameworks Version: 6.12.0 Qt Version: 6.8.2 Kernel Version: 6.4.0-150600.23.42-default (64-bit) Graphics Platform: Wayland
@tprotopopescu
Thanks for reporting. On Wayland xkb_keymap is retrieved directly from Wayland composer. I need more information to track down the cause. Would you be able to paste here every log lines between "using ydo fake-keyboard" and "xkb_keymap not initialized"?
Please make sure to start the app with --verbose option:
flatpak run net.mkiol.SpeechNote --verbose
Yes, no problem. Here is the output:
[D] 19:11:40.048272248.48 0x7f2c1c3c8d00 init_ydo:393 - using ydo fake-keyboard
[D] 19:11:40.048320369.48 0x7f2c1c3c8d00 operator():348 - connected ydo socket: /run/user/1000/.ydotool_socket
[D] 19:11:40.048426301.48 0x7f2c1c3c8d00 connect_wayland:750 - connect wayland
[D] 19:11:40.048801140.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_compositor version=6
[D] 19:11:40.048811498.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_tablet_manager_v2 version=1
[D] 19:11:40.048817842.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_keyboard_shortcuts_inhibit_manager_v1 version=1
[D] 19:11:40.048823971.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zxdg_decoration_manager_v1 version=1
[D] 19:11:40.048829646.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_viewporter version=1
[D] 19:11:40.048835188.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_fractional_scale_manager_v1 version=1
[D] 19:11:40.048840717.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_shm version=1
[D] 19:11:40.048846683.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_seat version=9
[D] 19:11:40.048855757.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_pointer_gestures_v1 version=3
[D] 19:11:40.048862309.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_pointer_constraints_v1 version=1
[D] 19:11:40.048868327.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_relative_pointer_manager_v1 version=1
[D] 19:11:40.048874396.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_data_device_manager version=3
[D] 19:11:40.048880644.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwlr_data_control_manager_v1 version=2
[D] 19:11:40.048886653.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_cursor_shape_manager_v1 version=1
[D] 19:11:40.048892598.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_idle version=1
[D] 19:11:40.048900162.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_idle_inhibit_manager_v1 version=1
[D] 19:11:40.048906194.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=ext_idle_notifier_v1 version=1
[D] 19:11:40.048914079.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_plasma_shell version=8
[D] 19:11:40.048919774.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_appmenu_manager version=2
[D] 19:11:40.048925331.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_server_decoration_palette_manager version=1
[D] 19:11:40.048930917.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_plasma_virtual_desktop_management version=2
[D] 19:11:40.048936395.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_shadow_manager version=2
[D] 19:11:40.048941704.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_dpms_manager version=1
[D] 19:11:40.048946936.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_server_decoration_manager version=1
[D] 19:11:40.048952310.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=kde_output_management_v2 version=12
[D] 19:11:40.048958448.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zxdg_output_manager_v1 version=3
[D] 19:11:40.048964089.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_subcompositor version=1
[D] 19:11:40.048969524.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zxdg_exporter_v2 version=1
[D] 19:11:40.048974753.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zxdg_importer_v2 version=1
[D] 19:11:40.048980193.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=xdg_activation_v1 version=1
[D] 19:11:40.048986234.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_content_type_manager_v1 version=1
[D] 19:11:40.048992347.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_tearing_control_manager_v1 version=1
[D] 19:11:40.048998385.48 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=xdg_toplevel_drag_manager_v1 version=1
[D] 19:11:40.049004683.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=xdg_toplevel_icon_manager_v1 version=1
[D] 19:11:40.049010652.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=kde_screen_edge_manager_v1 version=1
[D] 19:11:40.049016190.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=frog_color_management_factory_v1 version=1
[D] 19:11:40.049021686.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_presentation version=2
[D] 19:11:40.049027074.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_color_manager_v1 version=1
[D] 19:11:40.049032618.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=xdg_wm_dialog_v1 version=1
[D] 19:11:40.049038056.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=kde_external_brightness_v1 version=2
[D] 19:11:40.049043263.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_alpha_modifier_v1 version=1
[D] 19:11:40.049048948.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_drm version=2
[D] 19:11:40.049054420.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_linux_dmabuf_v1 version=4
[D] 19:11:40.049059899.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_linux_drm_syncobj_manager_v1 version=1
[D] 19:11:40.049065364.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=xdg_wm_base version=6
[D] 19:11:40.049070965.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwlr_layer_shell_v1 version=5
[D] 19:11:40.049077247.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=kde_output_device_v2 version=11
[D] 19:11:40.049083413.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wp_drm_lease_device_v1 version=1
[D] 19:11:40.049089507.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=kde_output_order_v1 version=1
[D] 19:11:40.049095801.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_text_input_manager_v1 version=1
[D] 19:11:40.049102055.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_text_input_manager_v2 version=1
[D] 19:11:40.049108111.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=zwp_text_input_manager_v3 version=1
[D] 19:11:40.049114189.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_blur_manager version=1
[D] 19:11:40.049120554.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_contrast_manager version=2
[D] 19:11:40.049126579.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=org_kde_kwin_slide_manager version=1
[D] 19:11:40.049132679.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=xdg_system_bell_v1 version=1
[D] 19:11:40.049138667.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=kde_output_device_v2 version=11
[D] 19:11:40.049145066.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_output version=4
[D] 19:11:40.049151039.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=kde_output_device_v2 version=11
[D] 19:11:40.049157184.49 0x7f2c1c3c8d00 wly_global_callback:798 - wl global: interface=wl_output version=4
[W] 19:11:40.049410243.49 0x7f2c1c3c8d00 wly_keyboard_keymap:854 - map shm failed
[D] 19:11:40.049434051.49 0x7f2c1c3c8d00 connect_wayland:775 - wayland roundtrip done
[D] 19:11:40.049448141.49 0x7f2c1c3c8d00 make_compose_table:380 - trying compose file: /usr/share/X11/locale/en_US.UTF-8/Compose
[D] 19:11:40.052686130.52 0x7f2c1c3c8d00 () - stt intermediate text decoded: *** "en" 0
[D] 19:11:40.052847238.52 0x7f2c1c3c8d00 () - stt engine eof: 0
[D] 19:11:40.052857993.52 0x7f2c1c3c8d00 () - cancel: 0
[D] 19:11:40.052901761.52 0x7f2c1c3c8d00 request_stop:283 - stt stop requested
[D] 19:11:40.052907399.52 0x7f2c1c3c8d00 stop_processing_impl:389 - whisper cancel
[D] 19:11:40.052923145.52 0x7f2ab8dff680 flush:517 - flush: exit
[D] 19:11:40.052938073.52 0x7f2ab8dff680 reset_in_processing:424 - reset in processing
[D] 19:11:40.052941661.52 0x7f2ab8dff680 process:345 - stt processing ended
[D] 19:11:40.053128101.53 0x7f2c1c3c8d00 () - service refresh status, new state: listening-auto
[D] 19:11:40.062929431.62 0x7f2c1c3c8d00 () - app task state: processing => idle
[D] 19:11:40.064008208.64 0x7f2c1c3c8d00 () - stt engine stopping
[D] 19:11:40.064300198.64 0x7f2c1c3c8d00 () - service refresh status, new state: listening-auto
[D] 19:11:40.064312392.64 0x7f2c1c3c8d00 () - task state changed: 0 => 6
[D] 19:11:40.064329542.64 0x7f2c1c3c8d00 () - stt engine stopped: 0
[D] 19:11:40.064337054.64 0x7f2c1c3c8d00 () - stop stt engine
[D] 19:11:40.064343072.64 0x7f2c1c3c8d00 request_stop:279 - stt stop already requested
[D] 19:11:40.064350698.64 0x7f2c1c3c8d00 stop:308 - stt stop completed
[D] 19:11:40.064357670.64 0x7f2c1c3c8d00 () - mic source dtor
[D] 19:11:40.064641159.64 0x7f2c1c3c8d00 () - [dbus app] TaskState called
[D] 19:11:40.064691824.64 0x7f2c1c3c8d00 () - app task state: idle => cancelling
[D] 19:11:40.066049876.66 0x7f2c1c3c8d00 () - service refresh status, new state: idle
[D] 19:11:40.066066190.66 0x7f2c1c3c8d00 () - service state changed: listening-auto => idle
[D] 19:11:40.066079400.66 0x7f2c1c3c8d00 () - task state changed: 6 => 0
[D] 19:11:40.067355580.67 0x7f2c1c3c8d00 () - [dbus app] TaskState called
[E] 19:11:40.067405741.67 0x7f2c1c3c8d00 key_from_character:197 - xkb_keymap not initialized
[D] 19:11:40.067753591.67 0x7f2c1c3c8d00 () - app current task: 0 => -1
[W] 19:11:40.067764000.67 0x7f2c1c3c8d00 () - invalid task, reseting task state
[D] 19:11:40.067770554.67 0x7f2c1c3c8d00 () - app task state: cancelling => idle
[D] 19:11:40.068578732.68 0x7f2c1c3c8d00 () - app service state: listening-auto => idle
[D] 19:11:40.068614417.68 0x7f2c1c3c8d00 operator():348 - connected ydo socket: /run/user/1000/.ydotool_socket
[D] 19:11:40.068660492.68 0x7f2c1c3c8d00 operator():348 - connected ydo socket: /run/user/1000/.ydotool_socket
[W] 19:11:40.071242017.71 0x7f2c1c3c8d00 () - no available mnt langs
[W] 19:11:40.071254590.71 0x7f2c1c3c8d00 () - no available mnt out langs
[W] 19:11:40.071258343.71 0x7f2c1c3c8d00 () - no available tts models for in mnt
[W] 19:11:40.071260927.71 0x7f2c1c3c8d00 () - no available tts models for out mnt
[W] 19:11:40.071263313.71 0x7f2c1c3c8d00 () - invalid task, reseting task state
[W] 19:11:40.071624172.71 0x7f2c1c3c8d00 () - ignore TaskStatePropertyChanged signal
[D] 19:11:40.072294376.72 0x7f2c1c3c8d00 () - [dbus app] TaskState called
[D] 19:11:40.072480067.72 0x7f2c1c3c8d00 () - [dbus app] State called
[E] 19:11:40.072539565.72 0x7f2c1c3c8d00 key_from_character:197 - xkb_keymap not initialized
@tprotopopescu Thanks. The problem is here:
[W] 19:11:40.049410243.49 0x7f2c1c3c8d00 wly_keyboard_keymap:854 - map shm failed
https://github.com/mkiol/dsnote/blob/89c8b9ef3391fbf3855ec57483716a39b4d384d8/src/fake_keyboard.cpp#L851-L856
At the moment, I have no idea why mmap is failing on your system, but I'm trying to figure it out.
My log above was obtained on an Ubuntu 24.04 system via the following command: flatpak run --branch=beta -v net.mkiol.SpeechNote --verbose
@tprotopopescu
At the moment, I have no idea why
mmapis failing on your system, but I'm trying to figure it out.
I have released "Beta 2" with the following significant changes:
- better logging has been added to investigate the cause of the problem with
mmap - if the "keymap" (mapping between keyboard keys and key codes) cannot be retrieved from Wayland, the application will revert to the standard US keyboard layout.
I would appreciate it if you could test this and see what the map shm failed trace contains.
FYI: With 4.8.0 Beta 2, I noticed the line Flash attention 2 is not installed in the console log.
In case this is an oversight, adding Flash attention would be great as it can speed up inference (particularly Parler is still pretty slow even on an A4500).
However, Flash Attention can be a bit tricky to install, so YMMV. (E.g. with uv (in a different context), I had to use UV_FIND_LINKS="https://github.com/Dao-AILab/flash-attention/releases/download/v2.7.4.post1/flash_attn-2.7.4.post1+cu12torch2.6cxx11abiFALSE-cp313-cp313-linux_x86_64.whl" to get it to work, where system, CUDA and Python versions all have to match the installation).
At the moment, I have no idea why
mmapis failing on your system, but I'm trying to figure it out.I have released "Beta 2" with the following significant changes:
* better logging has been added to investigate the cause of the problem with `mmap` * if the "keymap" (mapping between keyboard keys and key codes) cannot be retrieved from Wayland, the application will revert to the standard US keyboard layout.I would appreciate it if you could test this and see what the
map shm failedtrace contains.
Thanks again for looking into this. While trying to figure out how to update I started the Beta one version and found that now text to active window works. The only thing that I can think of that changed is that I did a regular system update, which may have fixed whatever the problem was.
I'm happy to report that the global keyboard shortcuts now work again with 4.8.0 Beta 3. I'm not certain what caused the change to the previous state, but I suspect that maybe one of the following actions might have played a role:
-
I disabled and re-enabled the 'Use global keyboard shortcuts' setting.
-
After that, I triggered the shortcut action once while the SpeechNote window was in the foreground and focused.
Anyway, thanks again for this great app!
The latest beta on FlatHub complains that it requires version 1.4 of the NVIDIA add-on, but I can only seem to find version 1.3.
Additional observation regarding my keyboard shortcut issues above: It turns out that I have to dis- and immediately re-enable 'Use global keyboard shortcuts' once each time I run the app for shortcuts to work on my system.
@phirsch
The latest beta on FlatHub complains that it requires version 1.4 of the NVIDIA add-on, but I can only seem to find version 1.3.
Yes, I've updated the main app on the beta channel, but I didn't announce it because I'm having a lot of difficulty updating the add-ons. The problem lies in the recent changes to the Flathub infrastructure. New limits have been put on the size of packages. The add-on packages are just too big to fit into the new limits. I am trying to figure out how to reduce its size.
You certainly know more about FlatPak packaging than I do, but I was wondering whether using extra-data for downloading assets/libs at install time, further splitting the packages, or converting them into 'runtimes' might work?
@phirsch
You certainly know more about FlatPak packaging than I
I wasn't so sure about that.
I was wondering whether using extra-data for downloading assets/libs at install time, further splitting the packages, or converting them into 'runtimes' might work?
The idea of using extra-data was also suggested here. I am now checking what can be transferred to the extra-data part. Unfortunately, the package size is not the only problem. I am also observing build errors due to timeouts. The problem is mainly with the CUDA version of the ctranslate2 library. Building it from source is a nightmare. But in general I think I will soon be able to push the NVIDIA package. The biggest challenge will be the AMD package, because currently its size is 7 GB, but I need to reduce it to 2 GB! :)
Splitting it into smaller packages is on the table, but as an option of last resort.
You certainly know more about FlatPak packaging than I
I wasn't so sure about that.
I was wondering whether using extra-data for downloading assets/libs at install time, further splitting the packages, or converting them into 'runtimes' might work?
The idea of using
extra-datawas also suggested here. I am now checking what can be transferred to theextra-datapart. Unfortunately, the package size is not the only problem. I am also observing build errors due to timeouts. The problem is mainly with the CUDA version of the ctranslate2 library. Building it from source is a nightmare. But in general I think I will soon be able to push the NVIDIA package. The biggest challenge will be the AMD package, because currently its size is 7 GB, but I need to reduce it to 2 GB! :)Splitting it into smaller packages is on the table, but as an option of last resort.
@mkiol Just throwing it out there, could changing how you use compression help? (any easy gains there? Zstandard has a good compression ratio).
Since flathub seems to be unreliable right now (at least in this instance), perhaps we should have a fallback? this can mitigate any issues arising from the central point of failure in future.
I had a quick look, it seems to be convenient to setup, you can even use github pages or GitLab equivalent to host it (according to the docs).
ADDED CONTEXT: I have a feeling the timeout errors are likely more due to the new build timeout limits than config changes. Also according to flathub, BuildBot has been replaced by a custom solution called Vorarbeiter (supposedly to improve build times, etc), according to them it's meant to be a drop-in replacement, but has some notable changes: "Vorarbeiter no longer allows to manually publish, cancel or delete builds. The publishing happens every hour regardless of the age of the build.", links below for more.
References
Flatpak Docs, Hosting a repo: https://docs.flatpak.org/en/latest/hosting-a-repository.html
Flatpak docs, Extra Data (For non-free packages): https://docs.flatpak.org/en/latest/module-sources.html#extra-data
Flathub Infra Revamp: https://docs.flathub.org/blog/flathub-build-infrastructure-revamp
Flathub's new BuildBot replacement: https://docs.flathub.org/blog/vorarbeiter-is-here
My ArchLinux ctranslate2 test: https://aur.archlinux.org/cgit/aur.git/tree/PKGBUILD?h=ctranslate2
P.S: I am not very knowledgeable with flatpak but if you need an extra pair of hands let me know what is needed, here or contact me ([email protected]).
Hope this helps,
James Clarke
Additional PC testing (NVIDIA storage usage):
i don't own an AMD GPU, so i can only really do real Nvidia testing on my archlinux machine .
I did a local archlinux build of ctranslate2 from the official repo PKGBUILD, it is 7.4MB compressed as .zst and 52.6MB uncompressed, and it already has -DWITH_CUDA='ON',sysinfo below, also for cuda I ran ncdu on /opt/cuda, its 6.4GB in size, and cuda is the only pacman package that owns that dir, the 'nvidia' package(kernel modules) is around ~72M when installed, so ~6.5GB total (nvidia-utils is another 792MB).
This should be a good proxy for a standard install's disk usage for the current stable NVIDIA stuff.
My archlinux system info (in case its relevant)
OS: Arch Linux x86_64
Kernel: Linux 6.14.7-arch2-1
DE: GNOME 48.2
WM: Mutter (Wayland)
CPU: AMD Ryzen 7 5700G (16) @ 4.67 GHz
GPU: NVIDIA GeForce RTX 4070 [Discrete] (CUDA 12.8, Driver Version: 570.153.02)
Memory: 7.71 GiB / 62.58 GiB (12%)
Swap: Disabled
Disk (/): 1.46 TiB / 1.82 TiB (80%) - xfs
@JamesClarke7283 Thanks a lot for your suggestions and exploring possible options.
Since flathub seems to be unreliable right now (at least in this instance), perhaps we should have a fallback? this can mitigate any issues arising from the central point of failure in future. I had a quick look, it seems to be convenient to setup, you can even use github pages or GitLab equivalent to host it (according to the docs).
Yes, self-hosting or hosting on GitHub might be an option, but I really want to avoid it. In particular, I find that Github is not very reliable. Very often, downloading from GitHub is very slow. From time to time I observe timeouts and things like that. Also Microsoft can, without any notice, just limit or block sharing big files on GitHub. For the time being, I would really like to stick to Flathub as the main distribution channel as long as possible + other channels such as Arch AUR dsnote package, Opensuse Pacman speechnote package, which are maintained not by me, but by volunteers.
I managed to overcome the "size" problem of the NVIDIA package. I moved all large files to extra-data. In practice, this means that the Flatpak package does not contain these files, but when you installing the package, all the "large files" are downloaded separately from the original sources, directly to your PC and merged locally. This is not a perfect solution, but it works for now.
The updated NVIDIA package (v1.4) is now available on the flathub-beta channel. I would appreciate your tests and feedback.
@JamesClarke7283 Thanks a lot for your suggestions and exploring possible options.
Since flathub seems to be unreliable right now (at least in this instance), perhaps we should have a fallback? this can mitigate any issues arising from the central point of failure in future. I had a quick look, it seems to be convenient to setup, you can even use github pages or GitLab equivalent to host it (according to the docs).
Yes, self-hosting or hosting on GitHub might be an option, but I really want to avoid it. In particular, I find that Github is not very reliable. Very often, downloading from GitHub is very slow. From time to time I observe timeouts and things like that. Also Microsoft can, without any notice, just limit or block sharing big files on GitHub. For the time being, I would really like to stick to Flathub as the main distribution channel as long as possible + other channels such as Arch AUR
dsnotepackage, Opensuse Pacmanspeechnotepackage, which are maintained not by me, but by volunteers.
I see, if its volunteer based you want, we could host it on my server. (Its only important job is hosting my static personal website).
Details:
I have it at home, its connected to a UPS (Uninterruptible Power Supply) and i have it on 24/7 for my work (if it goes down i would know about it pretty quick).
Only consideration could be bandwidth, I am on a 500Mbps connection.
My server has a 97.3% uptime according to my UPS. (Might have been the full 1 day power outage when we had the storms/turbulent weather here in the UK). I have had the server at least 2 years.
I assume maintaining the builds i mostly done through the build config, i could easily add a ssh user if needed.
Based on your flathub page you get around ~200 downloads per day, which my system definitely capable of, i have breathing room for likely 10x that at least (likely a lot more).
@mkiol perhapse shoot me an email if this is something you would find helpful, personally i hate when providers get a certain size and what you get back is less and less for some reason.
Anything we can do to combat that would be awesome (;
I personally would welcome nix packaging support, but I am aware that most people don't care about this at all and it is a wholly separate can of worms, so absolutely feel free to ignore this or even delete this comment.