Integrated WebUI does not support Firefox
Describe the bug The MIME Type used for audio generation is not supported by Firefox (v134.0.2). Works fine with Chromium based Browsers.
Screenshots or console output
Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource
Branch / Deployment used
Tested with the kokoro-fastapi-gpu v0.1.4 docker image
I think Firefox just doesn't support the mpeg/mp3 format for mediasource as far as I'm aware
Do you get the same issue if selecting WAV?
Do you get the same issue if selecting WAV?
Yes. The error appears with either of the three options: MP3, WAV and PCM.
Will check it out
Hello! : )
I can't generate in Firefox using mp3 , wav or PCM. In Chrome and Edge I can generate but the play button is blocked and I can only download the audio file.
"Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource"
I have tried to change the address from 0.0.0.0:8080 to localhost:8080 which allowed me to use my headset in Firefox. From what I read this is served via HTTPS and you have to self sign a certificate and whatnot... I found that localhost is easier for my use case.
Hi, I had the same issue in Firefox when using FastKoko while Open WebUI managed to play the audio generated by FastKoko (through its API, not by using the new local Kokoro feature in Open WebUI).
After a bit of digging, I found that they are using the new Audio() constructor without MediaSource.
They turn the response from the fetch into a blob, apply URL.createObjectURL on it and set it as the source of the <audio> object/element (they do so in the Audio constructor but the generated URL can also be written in the src attribute). They "stream" the audio by splitting the messages using punctuation by default.
I have tested the following code in the developer console on the FastKoko web UI:
async function testAudio() {
const audio = new Audio();
const res = await fetch("/v1/audio/speech", {
"headers": {
"Content-Type": "application/json"
},
"body": "{\"input\":\"Test\",\"voice\":\"af_alloy\",\"response_format\":\"mp3\",\"download_format\":\"mp3\",\"stream\":true,\"speed\":1,\"return_download_link\":true,\"lang_code\":\"a\"}",
"method": "POST"
});
const blob = await res.blob();
audio.src = URL.createObjectURL(blob);
await audio.play();
}
testAudio();
and it does work for the 3 types proposed in FastKoko (MP3, WAV, PCM).
I do not how why audio/mpeg works when setting the src attribute but not with MediaSource though.
same here, cpu version, I can confirm apple silicon safari does not work either mp3 wav or pcm "Error generating speech: The object is in an invalid state." but chrome works fine.
On v0.2.3 I get this errorr Error generating speech: [Errno 13] Permission denied: '/app/api/temp_files/tmp6kz25n1x.mp3' in both firefox and chrome when pressing generate speech.
I connected to the docker image and did this, and the web app works for me now
root@2682d8662ab7:/app/api# ls -al
total 8
drwxrwxrwx 1 root root 4096 Feb 9 14:51 .
drwxr-xr-x 1 appuser appuser 4096 Mar 7 09:29 ..
-rwxrwxrwx 1 root root 38 Feb 8 18:16 __init__.py
drwxr-xr-x 1 ubuntu ubuntu 4096 Feb 9 14:51 __pycache__
drwxrwxrwx 1 root root 4096 Mar 7 09:29 src
drwxr-xr-x 1 ubuntu ubuntu 4096 Feb 9 14:51 temp_files
drwxrwxrwx 1 root root 4096 Mar 7 09:29 tests
root@2682d8662ab7:/app/api# chown appuser:appuser temp_files
I think the 'ubuntu' user/permission may be something from my particular WSL setup on windows.
Then it worked in chrome but not firefox. On firefox Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource
Just to add onto the already excellent detective work by @pimartin
I ran across a SO thread/question where others were running into a similar issue (MediaSource Extension incompatibilities), and one of the users was able to get some additional information from some of the Firefox engineers working on their Media Source Extensions: https://stackoverflow.com/a/34498784
That answer is, admittedly, from 10 years ago, but not much seems to have changed with the MSEs on FF: MP3 is still not supported, nor are PCM or WAV (as we have all been finding out). What has graduated to being supported by default, however, is webm and MP4. This lead another person in the same question to implement an MP3-to-MP4 wrapper: https://stackoverflow.com/a/65932894 They were also kind enough to include a gist
Perhaps the path of least resistance would be implementing an MP4 wrapper? Not sure what the level of effort would be compared to refactoring/adding support for playing through an Audio() object.
I'm using a build based off a578d22 (the tip of master), based off the CUDA-12.6 image instead of CUDA-12.8 (the GPU version) running in Docker on my Linux server (accessed via Traefik).
same here, cpu version, I can confirm apple silicon safari does not work either mp3 wav or pcm "Error generating speech: The object is in an invalid state." but chrome works fine.
I tested today on an M2 MacBook Air and it worked in Safari, so the issue you're seeing might be limited to the CPU version or may be related to how Safari is configured for you. Can you confirm that by "Apple Silicon Safari," you mean "Safari on an Apple Silicon Mac" and not "Safari on an Apple Silicon iPad?"
It also worked for me in Chrome, but not Firefox.
On iPhone I couldn't get it to work in any browser - I tried Safari, Orion, Chrome, iCabMobile, Brave, and the DuckDuckGo browser.
On Android, I tested in Chrome and Firefox. It worked in Chrome but not in Firefox.
Can kokoro web ui use opus? This returns true for Firefox on Linux:
MediaSource.isTypeSupported('audio/webm; codecs="opus"')
any progress on this issue?
i'm not able to play speech on my firefox browser, getting:
Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource
i'm also not able to play it on iphone (ios safari), getting:
Error generating speech: Can't find variable: MediaSource
everything works perfectly on desktop chrome for me
i'm happy to use a work-around for the time being if anyone's found one
thank you for your work on such an awesome product!!
any progress on this issue?
@jakenybo I can't speak for remsky, but I tried implementing a couple solutions suggested here myself and didn't have any luck.
In terms of workarounds, though, you can:
- Use Chrome on desktop or Android
- Use the API, either directly (i.e., with curl or Insomnia), via one of the example Python files, or via another tool. For example, I use it via Open-WebUI, and this works in other browsers.
I forked arham-kk/openai-tts, made a few changes to make it compatible with this API, and I've confirmed that it works in Firefox on Mac and in Safari on iPhone. You can try out my repo as a workaround. It's on an old version of Gradio and it doesn't support voice mixing, but it works.
Sharing @corvec's experience, I tried to fix this, and didn't have any success.
Here are my notes:
AudioService.streamAudioalways requests mp3 data from the OpenAI-compatible API. Other formats like Opus, AAC and FLAC would be available.AudioService.setupAudioStreamfeeds that mp3 data into aMediaSource, which does not support theaudio/mpegmime type in Firefox.
Switching the API request to Opus, AAC etc. works to get the data from the API, but those codecs are seemingly still not supported for playback by Firefox AFAICT.
MediaSource.isTypeSupported("audio/opus"); // false
MediaSource.isTypeSupported("audio/aac"); // false
MediaSource.isTypeSupported("audio/flac"); // false
Seems to me that adding Firefox playback support would require an additional transcoding step.
As a small contribution to this, my Firefox is pretty locked down security and privacy wise. My Firefox will almost always prevent media manipulation and attempt to upgrade resources, like images, to HTTPS. This, for example, broke my Open WebUI by not allowing any images to display, and not even querying me for the permissions to use my microphone for conversational prompting.
This was solved by excluding my local server via exceptions, but most importantly, editing my \Windows\System32\drivers\etc\hosts file to give a domain name to my server, i.e. X.X.X.X servername.local and accessing my hosted resources via http://servername.local:####/etc.
For some reason, Firefox allows more leniency with a domain name than an IP.
All of that to say, the above does not resolve the issue being experienced here. I can utilize the /web perfectly fine in Chrome, but Firefox encounters the Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource error, regardless of media format. So in accordance to what @ItsJustRuby said above, I think we're looking at a deeper fix for Firefox, due to Firefox specific issues. A year ago I would have said don't worry about it, but since the whole Manifest V3 thing... well, Firefox is more relevant than ever.
Any news on this? I'm a FF user since Chrome picked a side (of ads),
I just wrote my own frontend, and primarily use the API anyways.
Gonna throw my hat in the ring too, Firefox and derivatives are borked. This one was very easy to use so this disappoints me. I've been looking for something I can easily just paste text into from anywhere that is quick and easy to set up and use. I have vision problems so I am a very weak reader in general so long aticles and such take me way too long to go through so I like to use TTS and listen to it instead. Looks like my quest continues.
It doesn't work in Firefox... I get the following error
Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource -->
(There is also the issue of not being able to select a voice at all if FastKoko is setup on a reverse proxy)