LiveSPICE icon indicating copy to clipboard operation
LiveSPICE copied to clipboard

Parallel circuits performance

Open z26 opened this issue 6 years ago • 8 comments

Hi, I'd have an application in mind that would involve processing severals inputs simultaneously, each in its separate circuit. Is livespice able to use multiple cpu cores in such a scenario? This would be simple to parallelize in theory but I don't know if it is implemented in practice.

On a separate note, I've successfully sent an audio file through livespice using this https://www.vb-audio.com/Voicemeeter/banana.htm and I've also recorded the output with it, but there are a few tricks necessary to make it work. I could explain these if anyone is interested..

z26 avatar Oct 22 '18 00:10 z26

Right now there's nothing to take advantage of multiple CPUs. I thought a bit about doing this, but I couldn't think of a clean simple way to enable this. Even if you have multiple independent inputs/outputs, the audio IO system still runs all of them from the same thread.

There are some other interesting possibilities here. I added the 'Delay Buffer' component to try to enable people to break circuits into independent pieces to simplify simulations, but I never really experimented much with it. If a circuit is fully isolated with delay buffers, theoretically each isolated chunk could run on separate threads with a queue between them. But this would add possibly significant latency.

dsharlet avatar Oct 25 '20 06:10 dsharlet

Thanks for the reply!

why will this induce latency? Does the software must wait for whatever section is executing the slowest before beginning the execution of the next "frame"? (don't know how spice simulations work under the hood)

thanks for the information about the delay buffer, this may be handy if the simulation difficulty doesn't scale linearly with complexity.

I've put aside the project I was considering using livespice for (for now) but the concept of processing audio by simulating hardware circuits sounds really cool in theory. It must be more cpu taxing than regular digital effects and I'm not sure how close the output of simulated circuits can be to the real thing, but the concept is exciting for hardware nerds.

z26 avatar Oct 25 '20 21:10 z26

The simulations are processed in blocks, much like ASIO and other tools have a buffer size, so do these simulations. When you want to parallelize, you need at least one buffer of extra delay, because the second thread can't work on a buffer until the first thread finishes.

I think there are at least two good reasons that these simulations are more expensive than hand-engineered simulations:

  • Overhead from dynamically generating the code to implement the simulation. This is the main thing I invested a lot of time in reducing.
  • A lot of parts of the simulation might be unnecessary or can be simplified by approximating some part of the simulation. Doing this automatically is very difficult.

I think in general, simulations like these are quite accurate, in that if you were to measure the input and output of the real circuit, I think they would close match. But they only accurately model the circuit itself. There are a lot of other things in the chain that affect the sound...

dsharlet avatar Oct 25 '20 21:10 dsharlet

Thanks for the extra information!

I'm assuming avoiding this issue completely would require writing a custom version of ASIO or something like that, which just isn't practical.

I wonder if the extra latency scales linearly with each additional circuit.

"A lot of parts of the simulation might be unnecessary or can be simplified by approximating some part of the simulation. Doing this automatically is very difficult."

That was my main concern. Thankfully, computing power is cheap these days.

z26 avatar Oct 25 '20 22:10 z26

It's not really about ASIO in particular, it's a pretty fundamental challenge. Latency, parallelism, and overhead often compete with each other. Buffer sizes, queues between threads, etc. are tools that allow a system to trade off between these things, but it's hard to get any one of them for free...

dsharlet avatar Nov 05 '20 04:11 dsharlet

Hello, thank you for your amazing project, this is very helpful for designing tonestacks. I designed a tube preamp but the processing is too heavy for a single CPU core (Ryzen 3700x).

Enabling multithreading would be such a great achievement to take profit of modern CPUs.

My current workaround is to split the preamp circuit in 3 files loaded in 3 different instances of LiceSPICE with audio routed through several ports of my audio card. The CPU load is now on 3 cores but the latency increases since the audio card is used several times.

Splitting a whole circuit in sequential chunks that works in dedicated thread looks possible in pure C# but the way works SPICE and VST is a bit mystical to me.

That said, other optimisations as you said like .NET 5.0 migration, code generation optimisation/approximation, etc. are places for improvements too.

mdouchement avatar Jan 01 '21 12:01 mdouchement

Interesting workaround you've come up with :)

I am curious if you might see enough speedup just by splitting the circuit into 3 independent pieces. Can you try putting the 3 circuit components together, but using the 'DelayBuffer' component between them? This isolates the 3 circuit components in the same way that you have, and if I were to implement some kind of built-in parallelism system, it would probably require this anyways (otherwise there is no parallelism to exploit).

dsharlet avatar Jan 02 '21 02:01 dsharlet

My bad, I've tried Buffer instead of DelayBuffer before splitting in 3 files. A simple preamp (2 gain stages 12AX7 with a tonestack) works pretty well. The CPU load is lighter and switchs between cores, by changing the affinity of LiveSPICE, it remains on the selected core which avoids some audio lag.

With a more complicated preamp, I get a stuttering sound. TB500

mdouchement avatar Jan 02 '21 09:01 mdouchement