phaze Optimization Algorithm Implementation

This is a cool project, I tried to use it, but found that it seems to causes high CPU usage, it would be even better if the algorithm can be improved :)

Other information:

Here is a performance comparison of fft implementations: https://github.com/scijs/fourier-transform/blob/master/benchmark.md
Skip data processing when the player is paused, now it still causes high CPU usage after the player is paused

Jun 01 '23 06:06 lyswhut

I have worked with optimizing this on a private fork for my use in an electron app. Due to my use case I was able to optimize specifically for Chrome web audio quirks. For example Chrome has this bug but it can be worked around to prevent processing during silence by checking for if (inputs[0].length < 2) instead of https://github.com/olvb/phaze/blob/841f37b822c955868075072a6abe8bfad782432e/src/ola-processor.js#L93 and then can be further optimized by stopping processing all 0s altogether once the block has been filled and there's no more tail. There are also some low hanging fruit microoptimizations doable like using pitchFactor = parameters.pitchFactor[0] instead of doing the slightly extra work of https://github.com/olvb/phaze/blob/841f37b822c955868075072a6abe8bfad782432e/src/phase-vocoder.js#L47 on every process (with k-rate automation all values in the array should be the same). Then you can make some more assumptions to optimize, like for example if you know in advance all input and output will be 2 channels then you don't have to reallocateChannelsIfNeeded() on every process. With things like this done I have it working reasonably well with a 4096 block size and near 0% cpu use on silence, albeit with some unavoidable latency, and it still glitches if I try to push it with too many of these nodes processing simultaneously.

I think this is as good as it can get using pure JS, and taking it to the next level would be to use WASM for FFT calculation like this one. But I'm not sure if there would be a net gain after the potential latency hit of passing all of that data back and forth between the audioWorklet and the wasm module and if it's feasible for realtime use.

Jun 01 '23 16:06 marcelblum

I am also using it for electron now, if (inputs[0].length < 2) doesn't work for me when the player is paused, it's still 2 (Electron v22.3.12, Windows 10) 🙁

Jun 02 '23 04:06 lyswhut

I'm using electron 22 as well. I'm not sure what you mean by "when the player is paused". In my tests I found that if no audio is coming through (the input is "inactive" in web audio parlance) then inputs[0] is either empty or contains 1 channel of silent audio data (this is erroneous behavior on the part of Chromium and not to spec), but you might want to make sure you're cleaning up used one-shot source nodes, disconnecting nodes that are no longer in use etc. to achieve this consistently. But you can also add an explicit check for silence in inputs if needed, though it feels inefficient and I've tried to avoid doing this, it can be necessary due to this chrome bug, e.g.:

const checkForNotSilence = (value) => value !== 0;
//...
if (inputs[0][0].some(this.checkForNotSilence) || inputs[0][1].some(this.checkForNotSilence)) { //assumes 2 channel input
  //do process
} else {
  //don't process
}

Just keep in mind the above example is oversimplified, because you still need to handle the tail in cases where you still need to process a larger block that contains partial silent buffer(s), since the worklet block size is larger than the web audio "render quantum", hence the latency, but this is necessary for high quality output.

I'm attaching my "optimized for stereo input in chromium" fork here you're welcome to try it, it's not secret I just haven't submitted any PRs to this repo because it's mostly specialized for this use case, though probably some of the optimizations could probably be applied to the main package to benefit all users.

Jun 02 '23 15:06 marcelblum

phase-vocoder.zip

Jun 02 '23 15:06 marcelblum

Thanks for sharing the fork :)

I created a demo in gist using Electron Fiddle, you can load it using the link below: https://gist.github.com/lyswhut/5f899a8aad24c578c27970c7f805d242

Now the player is not playing, but inputs[0].length is still 2:

Jun 03 '23 03:06 lyswhut

I see, you're using an <audio> player via createMediaElementSource. Try explicitly disconnecting mediaSource on pause and reconnecting on play every time to get the desired behavior. Also using my fork IIRC you must force 2 channel output for the worklet node using {outputChannelCount: [2]} because I added an assumption for that as an optimization to avoid checking for channel count changes on every process. Here's a rewrite of your gist's renderer.js incorporating these changes:

let audio
let audioContext
let mediaSource
let pitchShifterNode
let pitchShifterNodePitchFactor


const initAudio = async() => {
    audio = new Audio()
    audio.controls = false
    audio.autoplay = true
    audio.preload = 'auto'
    audio.crossOrigin = 'anonymous'

    audioContext = new window.AudioContext()
    mediaSource = audioContext.createMediaElementSource(audio)

    // Load audio worklet module
    return audioContext.audioWorklet.addModule('./phase-vocoder.js').then(() => {
    // return audioContext.audioWorklet.addModule('./origin-phase-vocoder.js').then(() => {
        console.log('pitch shifter audio worklet loaded')
        pitchShifterNode = new AudioWorkletNode(audioContext, 'phase-vocoder-processor', {outputChannelCount: [2]})
        let pitchFactorParam = pitchShifterNode.parameters.get('pitchFactor')
        if (!pitchFactorParam) return
        pitchShifterNodePitchFactor = pitchFactorParam
        // Connect node
        pitchShifterNode.connect(audioContext.destination)
    })
}

const dom_input_audio_src = document.getElementById('input_audio_src')
const dom_btn_play = document.getElementById('btn_play')
dom_btn_play.disabled = true
dom_input_audio_src.value = 'https://raw.githubusercontent.com/lyswhut/test-load-local-file/master/music2.mp3'

initAudio().then(() => {
    audio.addEventListener('playing', () => {
        dom_btn_play.innerText = 'Pause'
    })
    audio.addEventListener('pause', () => {
        dom_btn_play.innerText = 'Play'
    })
    dom_btn_play.disabled = false

    dom_btn_play.addEventListener('click', () => {
        if (audio.paused) {
            mediaSource.connect(pitchShifterNode);
            if (audio.src) {
                audio.play()
                return
            } else {
                dom_btn_play.innerText = 'Loading...'
                audio.src = dom_input_audio_src.value
            }
        } else {
            audio.pause()
            mediaSource.disconnect(pitchShifterNode);
        }
    })
})

Jun 03 '23 04:06 marcelblum

Cool, it works! I changes:

 let audio
 let audioContext
 let mediaSource
 let pitchShifterNode
 let pitchShifterNodePitchFactor
 
 
 const initAudio = async() => {
     audio = new Audio()
     audio.controls = false
     audio.autoplay = true
     audio.preload = 'auto'
     audio.crossOrigin = 'anonymous'
 
     audioContext = new window.AudioContext()
     mediaSource = audioContext.createMediaElementSource(audio)
 
     // Load audio worklet module
     return audioContext.audioWorklet.addModule('./phase-vocoder.js').then(() => {
     // return audioContext.audioWorklet.addModule('./origin-phase-vocoder.js').then(() => {
         console.log('pitch shifter audio worklet loaded')
-        pitchShifterNode = new AudioWorkletNode(audioContext, 'phase-vocoder-processor')
+        pitchShifterNode = new AudioWorkletNode(audioContext, 'phase-vocoder-processor', { outputChannelCount: [2] })
         let pitchFactorParam = pitchShifterNode.parameters.get('pitchFactor')
         if (!pitchFactorParam) return
         pitchShifterNodePitchFactor = pitchFactorParam
         
         // Connect node
-        mediaSource.connect(pitchShifterNode)
         pitchShifterNode.connect(audioContext.destination)
     })
 }
 
 const dom_input_audio_src = document.getElementById('input_audio_src')
 const dom_btn_play = document.getElementById('btn_play')
 dom_btn_play.disabled = true
 dom_input_audio_src.value = 'https://raw.githubusercontent.com/lyswhut/test-load-local-file/master/music2.mp3'
 
+let isConnected = false
+const connectNode = () => {
+  if (isConnected) return
+  mediaSource.connect(pitchShifterNode)
+  isConnected = true
+}
+const disconnectNode = () => {
+  if (!isConnected) return
+  mediaSource.disconnect()
+  isConnected = false
+}
 initAudio().then(() => {
+    audio.addEventListener('playing', connectNode)
+    audio.addEventListener('pause', disconnectNode)
+    audio.addEventListener('waiting', disconnectNode)
+    audio.addEventListener('emptied', disconnectNode)
+
     audio.addEventListener('playing', () => {
         dom_btn_play.innerText = 'Pause'
     })
     audio.addEventListener('pause', () => {
         dom_btn_play.innerText = 'Play'
     })
     dom_btn_play.disabled = false
 
     dom_btn_play.addEventListener('click', () => {
         if (audio.paused) {
             if (audio.src) {
                 audio.play()
                 return
             } else {
                 dom_btn_play.innerText = 'Loading...'
                 audio.src = dom_input_audio_src.value
             }
         } else {
             audio.pause()
         }
     })
 })

According to the test, the block size needs to be at least 4096 so that the sound will not be distorted. After applying this fork, the CPU usage is reduced, and the CPU usage is minimized when the audio is paused. I think that if we want to optimize it significantly, we need to use WASM to Make the conversion work, from this post it works.

Thanks for your help! ❤️

Jun 03 '23 05:06 lyswhut

phaze phaze copied to clipboard

Optimization Algorithm Implementation

phaze
phaze copied to clipboard