surge icon indicating copy to clipboard operation
surge copied to clipboard

Surge FX build error on Linux when compiling with AVX2 support

Open vvrbanc opened this issue 3 years ago • 18 comments

Bug Description: Compile error:

src/common/dsp/effects/chowdsp/spring_reverb/SpringReverbProc.cpp: In member function ‘void chowdsp::SpringReverbProc::setParams(const chowdsp::SpringReverbProc::Params&, int)’:
src/common/dsp/effects/chowdsp/spring_reverb/SpringReverbProc.cpp:85:75: error: cannot convert ‘__m128’ to ‘juce::dsp::SIMDRegister<float>’
apf.setParams(msToSamples(0.35f + 3.0f * params.size), _mm_load_ps(apfGVec));
                                                       ~~~~~~~~~~~^~~~~~~~~
                                                                  |
                                                                  __m128

Surge XT Version 1.0.1 release, also tried HEAD 5e81fc20bb22a891c039aaf2951d6454d86694b4

Reproduction Steps:

  1. do "export CXXFLAGS=-mavx2" and compile as usual

Additional Information:

Tried with gcc-11 and clang-13. Clang gives a more verbose error description.

Any -march=XXX which includes AVX2, such as "-march=haswell", or "-march=native" on a modern cpu, will trigger the error.

Compiling with just "-mavx" works, also with any -march if "-mno-avx2" is added.

Looks to me like juce::dsp::SIMDRegister becomes __m256 when avx2 is supported, so the apfGVec should be an array of 8 floats, and _mm256_load_ps should be used instead?

But then regular non-avx builds would break so... we'd have to wrap everything into an #ifdef __AVX__. Or just force the SSE/non-avx code path for this particular function.

vvrbanc avatar Mar 25 '22 19:03 vvrbanc

Why are you compiling with avx?

baconpaul avatar Mar 25 '22 19:03 baconpaul

I made a surge package for Gentoo linux. Normally everything is compiled with "-march=native" so I had to make an exception for surge not to use global CXXFLAGS.

vvrbanc avatar Mar 25 '22 20:03 vvrbanc

Gotcha Yeah well looks like this is painful to fix. The juce simd class doesn’t have the vector class as a template arg. For now set arch to sse4.1 would be my advice if you can’t suppress the flag

baconpaul avatar Mar 25 '22 20:03 baconpaul

Suppressing the flag is easy enough. Just thought you'd want to know this is erroring out. BTW, this is the only thing that breaks with avx2 enabled, everything else builds fine.

Feel free to close this, just wanted to put it out there in case someone else runs into the problem.

I'll poke around a bit and see if I can make that painful fix :)

vvrbanc avatar Mar 25 '22 20:03 vvrbanc

Nah it’s super helpful. Tbis is the only place we don’t hand roll the simd (the rest of the code is explicit mm_add_ps stuff) and we want to get to avx one day “soon” so the report is useful as long as it’s not critical for you to get it fixed!

true story - we turned on avx required in our nightlies 18 months ago and had break reports in 24 hours. People run surge in old old kit

thanks for reporting it!

baconpaul avatar Mar 25 '22 20:03 baconpaul

Oh and yea since you said you might look we welcome pull requests if you can find an elegant solution! Open contribution project.

baconpaul avatar Mar 25 '22 20:03 baconpaul

(Zero pressure of course!)

baconpaul avatar Mar 25 '22 20:03 baconpaul

So i can't debug this on rosetta and don't have linux around but i was able to get a build which worked everywhere with this diff

paul ~/dev/music/surge (cmake-options) % git diff src 
diff --git a/src/common/dsp/effects/chowdsp/spring_reverb/SpringReverbProc.cpp b/src/common/dsp/effects/chowdsp/spring_reverb/SpringReverbProc.cpp
index 3d97251c..5c38c1a4 100644
--- a/src/common/dsp/effects/chowdsp/spring_reverb/SpringReverbProc.cpp
+++ b/src/common/dsp/effects/chowdsp/spring_reverb/SpringReverbProc.cpp
@@ -80,9 +80,10 @@ void SpringReverbProc::setParams(const Params &params, int numSamples)
     feedbackGain = std::pow(0.001f, delaySamples / (t60Seconds * fs));
 
     auto apfG = 0.5f - 0.4f * params.spin;
-    float apfGVec alignas(16)[4] = {apfG, -apfG, apfG, -apfG};
+    float apfGVec alignas(16)[juce::dsp::SIMDRegister<float>::size()] = {apfG, -apfG, apfG, -apfG};
     for (auto &apf : vecAPFs)
-        apf.setParams(msToSamples(0.35f + 3.0f * params.size), _mm_load_ps(apfGVec));
+        apf.setParams(msToSamples(0.35f + 3.0f * params.size),
+                      juce::dsp::SIMDRegister<float>::fromRawArray(apfGVec));
 
     constexpr float dampFreqLow = 4000.0f;
     constexpr float dampFreqHigh = 18000.0f;

the avx2 forced version cores on exit under rosetta so i don't know if it runs but i do know that the SSE and ARM versions of that code work on mac.

Wanna give it a whirl on your linux environment and see if the spring reverb still springs?

Also tagging @jatinchowdhury18 in case he's interested

baconpaul avatar Mar 29 '22 01:03 baconpaul

After compiling with this patch, the audio is garbled and the program segfaults after 5-6 seconds in the "SpringReverbProc::processBlock" function.

I did poke around a bit myself and made a patch that allows it to compile with both AVX2 and non-AVX2. Sounds ok and doesn't crash, but it's kind of dumb because it uses 256bit intrinsics while only utilizing the first 128 bits just like the regular SSE version. So it's more of a kludge just to get it to compile with -mavx2 for no real benefit. Also it's making the code ugly with an #ifdef.

From what I can tell, SpringReverb uses a SSE register to process the 4 floats in one go:

 //   [0]: y_left (feedforward path output)
 //   [1]: z_left (feedback path)
 //   [2-3]: same for right channel

It is not obvious to me how it could make use of 4 additional floats per loop that avx2 would provide.

Anyway, here it is:

https://gist.githubusercontent.com/vvrbanc/6a41c0bf098d3d0be2969818f0ac11f1/raw/a073b4a32cbfdfeeb0f2f53b7bee43158fd22704/gistfile1.txt

It is a haphazard patch, I don't really know what I'm doing and don't understand the code workflow (yet :) so I wouldn't go about making a PR just yet.

Unless compiling avx2 provides some benefit, I don't see any point in forcing the issue so "just don't compile with -mavx2" is IMO an acceptable workaround for the moment.

That being said, I will keep poking at it, maybe I produce a more elegant solution. I'm interested in learning DSP and contributing to the project.

vvrbanc avatar Mar 29 '22 11:03 vvrbanc

Right ok so let’s not merge anything yet. But I would like to have an avx option one day! So this is useful work.

And yes that seems to be what the filters are doing. Lots of our code is 4 way parallel in clever ways but only this class uses the juce register

what we really need is for juce to let us have instruction set as a template parameter…. Xsimd and other libraries let you do that. Then we could be explicit here. But no such luck.

baconpaul avatar Mar 29 '22 12:03 baconpaul

Maybe @jatinchowdhury18 could at some point take a look at refactoring that code not to depend on JUCE SIMD register? Then simde would have us covered, right?

mkruselj avatar Aug 21 '22 12:08 mkruselj

Yeah that’s basically the fix

baconpaul avatar Aug 21 '22 12:08 baconpaul

So as, I remember it, I had originally the Spring Reverb effect with using VecType = __m128 instead of using VecType = juce::dsp::SIMDRegister<float>. However, we ran into problems since the effect internally requires a std::vector<VecType> was not accepted by some compilers, I think due to alignment constraints or something like that. I could re-write the effect again using xsimd, or something other SIMD framework with more flexibility than JUCE, but that would mean bringing in another dependency, so I guess that's something we should discuss.

jatinchowdhury18 avatar Aug 22 '22 05:08 jatinchowdhury18

Do you remember which compilers whined?

mkruselj avatar Aug 22 '22 09:08 mkruselj

My bet is: MSVC 32 bit, which won't naturally align the vector. At least from googling.

https://gist.github.com/mszymczyk/91df9dd95bce6cfc078d9066ba4cc533

that may be the answer

baconpaul avatar Aug 22 '22 12:08 baconpaul

Welp, time to axe 32-bit 😀

mkruselj avatar Aug 22 '22 12:08 mkruselj

Ha maybe for xt 2 yeah.

baconpaul avatar Aug 22 '22 13:08 baconpaul

Actually, I thought it was one of the GCC builds... let me see if I can dig up that thread.

Edit: here it is, although it looks like it's missing some relevant context... maybe that's in Discord. Also, I think the CI runs that were failing have since been deleted by Azure.

jatinchowdhury18 avatar Aug 22 '22 16:08 jatinchowdhury18

diff --git a/src/common/dsp/effects/chowdsp/spring_reverb/SpringReverbProc.cpp b/src/common/dsp/effects/chowdsp/spring_reverb/SpringReverbProc.cpp
index 0e3129d5..cfc88751 100644
--- a/src/common/dsp/effects/chowdsp/spring_reverb/SpringReverbProc.cpp
+++ b/src/common/dsp/effects/chowdsp/spring_reverb/SpringReverbProc.cpp
@@ -82,8 +82,11 @@ void SpringReverbProc::setParams(const Params &params, int numSamples)
 
     auto apfG = 0.5f - 0.4f * params.spin;
     float apfGVec alignas(16)[4] = {apfG, -apfG, apfG, -apfG};
+    VecType apfGVecAsVecType = 0;
+    for (int i=0; i<4; ++i)
+        apfGVecAsVecType[i] = apfGVec[i];
     for (auto &apf : vecAPFs)
-        apf.setParams(msToSamples(0.35f + 3.0f * params.size), _mm_load_ps(apfGVec));
+        apf.setParams(msToSamples(0.35f + 3.0f * params.size), apfGVecAsVecType);
 
     constexpr float dampFreqLow = 4000.0f;
     constexpr float dampFreqHigh = 18000.0f;

I think that fixes it

baconpaul avatar Dec 14 '22 01:12 baconpaul