Kokoro-FastAPI Integrated WebUI does not support Firefox

Describe the bug The MIME Type used for audio generation is not supported by Firefox (v134.0.2). Works fine with Chromium based Browsers.

Screenshots or console output Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource

Branch / Deployment used Tested with the kokoro-fastapi-gpu v0.1.4 docker image

Jan 30 '25 17:01 0pac1ty

I think Firefox just doesn't support the mpeg/mp3 format for mediasource as far as I'm aware

Do you get the same issue if selecting WAV?

Jan 30 '25 22:01 remsky

Do you get the same issue if selecting WAV?

Yes. The error appears with either of the three options: MP3, WAV and PCM.

Jan 30 '25 23:01 0pac1ty

Will check it out

Jan 31 '25 06:01 remsky

Hello! : )

I can't generate in Firefox using mp3 , wav or PCM. In Chrome and Edge I can generate but the play button is blocked and I can only download the audio file.

"Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource"

Feb 01 '25 05:02 signalstop

I have tried to change the address from 0.0.0.0:8080 to localhost:8080 which allowed me to use my headset in Firefox. From what I read this is served via HTTPS and you have to self sign a certificate and whatnot... I found that localhost is easier for my use case.

Feb 01 '25 11:02 xektop

Hi, I had the same issue in Firefox when using FastKoko while Open WebUI managed to play the audio generated by FastKoko (through its API, not by using the new local Kokoro feature in Open WebUI).

After a bit of digging, I found that they are using the new Audio() constructor without MediaSource.

They turn the response from the fetch into a blob, apply URL.createObjectURL on it and set it as the source of the <audio> object/element (they do so in the Audio constructor but the generated URL can also be written in the src attribute). They "stream" the audio by splitting the messages using punctuation by default.

I have tested the following code in the developer console on the FastKoko web UI:

async function testAudio() {
  const audio = new Audio();
  const res = await fetch("/v1/audio/speech", {
    "headers": {
      "Content-Type": "application/json"
    },
    "body": "{\"input\":\"Test\",\"voice\":\"af_alloy\",\"response_format\":\"mp3\",\"download_format\":\"mp3\",\"stream\":true,\"speed\":1,\"return_download_link\":true,\"lang_code\":\"a\"}",
    "method": "POST"
  });
  const blob = await res.blob();
  audio.src = URL.createObjectURL(blob);
  await audio.play();
}
testAudio();

and it does work for the 3 types proposed in FastKoko (MP3, WAV, PCM).

I do not how why audio/mpeg works when setting the src attribute but not with MediaSource though.

Feb 14 '25 17:02 pimartin

same here, cpu version, I can confirm apple silicon safari does not work either mp3 wav or pcm "Error generating speech: The object is in an invalid state." but chrome works fine.

Mar 06 '25 04:03 yingjiegau

On v0.2.3 I get this errorr Error generating speech: [Errno 13] Permission denied: '/app/api/temp_files/tmp6kz25n1x.mp3' in both firefox and chrome when pressing generate speech.

I connected to the docker image and did this, and the web app works for me now

root@2682d8662ab7:/app/api# ls -al
total 8
drwxrwxrwx 1 root    root    4096 Feb  9 14:51 .
drwxr-xr-x 1 appuser appuser 4096 Mar  7 09:29 ..
-rwxrwxrwx 1 root    root      38 Feb  8 18:16 __init__.py
drwxr-xr-x 1 ubuntu  ubuntu  4096 Feb  9 14:51 __pycache__
drwxrwxrwx 1 root    root    4096 Mar  7 09:29 src
drwxr-xr-x 1 ubuntu  ubuntu  4096 Feb  9 14:51 temp_files
drwxrwxrwx 1 root    root    4096 Mar  7 09:29 tests
root@2682d8662ab7:/app/api# chown appuser:appuser temp_files

I think the 'ubuntu' user/permission may be something from my particular WSL setup on windows.

Then it worked in chrome but not firefox. On firefox Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource

Mar 07 '25 10:03 rain-1

Just to add onto the already excellent detective work by @pimartin

I ran across a SO thread/question where others were running into a similar issue (MediaSource Extension incompatibilities), and one of the users was able to get some additional information from some of the Firefox engineers working on their Media Source Extensions: https://stackoverflow.com/a/34498784

That answer is, admittedly, from 10 years ago, but not much seems to have changed with the MSEs on FF: MP3 is still not supported, nor are PCM or WAV (as we have all been finding out). What has graduated to being supported by default, however, is webm and MP4. This lead another person in the same question to implement an MP3-to-MP4 wrapper: https://stackoverflow.com/a/65932894 They were also kind enough to include a gist

Perhaps the path of least resistance would be implementing an MP4 wrapper? Not sure what the level of effort would be compared to refactoring/adding support for playing through an Audio() object.

Mar 09 '25 01:03 S-Bryce

I'm using a build based off a578d22 (the tip of master), based off the CUDA-12.6 image instead of CUDA-12.8 (the GPU version) running in Docker on my Linux server (accessed via Traefik).

same here, cpu version, I can confirm apple silicon safari does not work either mp3 wav or pcm "Error generating speech: The object is in an invalid state." but chrome works fine.

I tested today on an M2 MacBook Air and it worked in Safari, so the issue you're seeing might be limited to the CPU version or may be related to how Safari is configured for you. Can you confirm that by "Apple Silicon Safari," you mean "Safari on an Apple Silicon Mac" and not "Safari on an Apple Silicon iPad?"

It also worked for me in Chrome, but not Firefox.

On iPhone I couldn't get it to work in any browser - I tried Safari, Orion, Chrome, iCabMobile, Brave, and the DuckDuckGo browser.

On Android, I tested in Chrome and Firefox. It worked in Chrome but not in Firefox.

Mar 10 '25 04:03 corvec

Can kokoro web ui use opus? This returns true for Firefox on Linux:

MediaSource.isTypeSupported('audio/webm; codecs="opus"')

Apr 06 '25 06:04 porjo

any progress on this issue?

i'm not able to play speech on my firefox browser, getting: Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource

i'm also not able to play it on iphone (ios safari), getting: Error generating speech: Can't find variable: MediaSource

everything works perfectly on desktop chrome for me

i'm happy to use a work-around for the time being if anyone's found one

thank you for your work on such an awesome product!!

Apr 27 '25 17:04 jakenybo

any progress on this issue?

@jakenybo I can't speak for remsky, but I tried implementing a couple solutions suggested here myself and didn't have any luck.

In terms of workarounds, though, you can:

Use Chrome on desktop or Android
Use the API, either directly (i.e., with curl or Insomnia), via one of the example Python files, or via another tool. For example, I use it via Open-WebUI, and this works in other browsers.

I forked arham-kk/openai-tts, made a few changes to make it compatible with this API, and I've confirmed that it works in Firefox on Mac and in Safari on iPhone. You can try out my repo as a workaround. It's on an old version of Gradio and it doesn't support voice mixing, but it works.

Apr 28 '25 20:04 corvec

Sharing @corvec's experience, I tried to fix this, and didn't have any success.

Here are my notes:

AudioService.streamAudio always requests mp3 data from the OpenAI-compatible API. Other formats like Opus, AAC and FLAC would be available.
AudioService.setupAudioStream feeds that mp3 data into a MediaSource, which does not support the audio/mpeg mime type in Firefox.

Switching the API request to Opus, AAC etc. works to get the data from the API, but those codecs are seemingly still not supported for playback by Firefox AFAICT.

MediaSource.isTypeSupported("audio/opus"); // false
MediaSource.isTypeSupported("audio/aac"); // false
MediaSource.isTypeSupported("audio/flac"); // false

Seems to me that adding Firefox playback support would require an additional transcoding step.

May 26 '25 16:05 ItsJustRuby

As a small contribution to this, my Firefox is pretty locked down security and privacy wise. My Firefox will almost always prevent media manipulation and attempt to upgrade resources, like images, to HTTPS. This, for example, broke my Open WebUI by not allowing any images to display, and not even querying me for the permissions to use my microphone for conversational prompting.

This was solved by excluding my local server via exceptions, but most importantly, editing my \Windows\System32\drivers\etc\hosts file to give a domain name to my server, i.e. X.X.X.X servername.local and accessing my hosted resources via http://servername.local:####/etc.

For some reason, Firefox allows more leniency with a domain name than an IP.

All of that to say, the above does not resolve the issue being experienced here. I can utilize the /web perfectly fine in Chrome, but Firefox encounters the Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource error, regardless of media format. So in accordance to what @ItsJustRuby said above, I think we're looking at a deeper fix for Firefox, due to Firefox specific issues. A year ago I would have said don't worry about it, but since the whole Manifest V3 thing... well, Firefox is more relevant than ever.

Jul 30 '25 17:07 SteindelSE

Any news on this? I'm a FF user since Chrome picked a side (of ads),

Sep 06 '25 04:09 old-square-eyes

I just wrote my own frontend, and primarily use the API anyways.

Sep 13 '25 19:09 wbste

Gonna throw my hat in the ring too, Firefox and derivatives are borked. This one was very easy to use so this disappoints me. I've been looking for something I can easily just paste text into from anywhere that is quick and easy to set up and use. I have vision problems so I am a very weak reader in general so long aticles and such take me way too long to go through so I like to use TTS and listen to it instead. Looks like my quest continues.

Sep 29 '25 21:09 cammelspit

It doesn't work in Firefox... I get the following error

Error generating speech: MediaSource.addSourceBuffer: Type not supported in MediaSource -->

(There is also the issue of not being able to select a voice at all if FastKoko is setup on a reverse proxy)

Oct 14 '25 16:10 aindriu80