celeritas Add Ferrari-Cardano algorithm and general Quartic Solver test suite

This pull request adds FerrariSolver, a class implementing the Ferrari-Cardano method to solve quartic polynomials of the form ax⁴ + bx³ + cx² + dx + e = 0, and QuarticSolver.test, a class with tests for quartic solvers.

The Ferrari-Cardano method solves a related cubic equation to find a factor which breaks the quartic equation into two quadratic equations, which are then solved for the final roots.

The QuarticSolver test file contains a brief abstract framework for testing quartic functions, and a specific subclass FerrariSolverTest which plugs in FerrariSolver for those tests. Most of the tests can then be easily reused to test an alternative quartic solver, with a different subclass.

Oct 06 '25 05:10 osanstrong

Test summary

253 files 408 suites 8s :stopwatch: 1 310 tests 1 296 :white_check_mark: 14 :zzz: 0 :x: 1 344 runs 1 340 :white_check_mark: 4 :zzz: 0 :x:

Results for commit 68d750a1.

:recycle: This comment has been updated with latest results.

Oct 06 '25 05:10 github-actions[bot]

@osanstrong I know you're allergic to LLM bots but the kind of refactoring to replace 0.25 -> constexpr real_type quarter{0.25}; and q2 * q2 -> ipow<2>(q2) is an ideal use case of copilot to save some drudgery. If you've got tests that pass, I would recommend trying it otu.

Oct 21 '25 10:10 sethrj

@osanstrong this is shaping up nicely. Ping us whenever you are ready for us to take another look.

Oct 22 '25 16:10 elliottbiondo

@elliottbiondo @sethrj Sorry for the three-week delay (caught up with a glut of uni work); I think this should address the last couple comments from back then, with a bit of clarification on one of them that I put in the original conversation (About refactoring arguments for the functor constructor/operator)

Nov 11 '25 19:11 osanstrong

Also is there a good way to replicate the conditions of the automatic pull request test locally? The automatic test hosted on GitHub is catching a unit test fail that I think is relevant (finding a near-zero root instead of ignoring it) but I can't get that test to fail when I run it locally. The output says it's running ctest --preset=${CMAKE_PRESET}-unit but when I try locally there aren't any acceptable presets ending in -unit.

Nov 11 '25 20:11 osanstrong

The automatic tests are the CI (continuous-integration) tests. The one that is failing for you is noble-arm-clang-18-c++20 which means:

noble : Ubuntu 24.04, "Noble Numbat" arm : processor using ARM instruction set, as opposed to the more standard x86 instruction set you see on Intel/AMD processors clang-18: compiler being used c++20: the language standard being used

So unfortunately there is no way to test locally, unless you have an ARM machine with Ubuntu and etc. I would say that this is not a common CI failure, but perhaps it's not surprising giving that you are specifically test floating point edge cases. My theory is that whatever sequence of ARM instructions being done is different enough from their x86 counterparts that it effects the floating point precision enough to flip the result of your test. I think the only thing you can do is change the test to be further from this flip point.

Nov 11 '25 23:11 elliottbiondo

@osanstrong @elliottbiondo When an ARM test fails when the x86_64 build succeeds, it's almost always in some complex numerical function (as is the case here) and indicates a numerical instability in the calculations due to floating point arithmetic.

This is going to be hard to replicate locally: your best bet is to compile a RelWithDebInfo using -O3 -march=native, which will enable additional optimizations and advanced (sometimes "fused") instructions specific to your CPU. Those instructions can result in changes of a few ULP, which can add up and/or result in large differences when catastrophic floating point cancellation occurs.

Another option to replicate similar errors (which you'll see once the 'fast' ARM build passes) is to build with CELERITAS_REAL_TYPE=float, which will make cancellation errors even more apparent.

Once you're satisfied that there are no errors in the code (or obvious changes that could result in less cancellation), you can update the test to allow the different results, as we've done for the general quadric test:

    if constexpr (CELERITAS_REAL_TYPE == CELERITAS_REAL_TYPE_DOUBLE)
    {
        EXPECT_SOFT_EQ(1e-16, distances[0]);
        EXPECT_SOFT_EQ(6.0, distances[1]);
    }
    else if (distances[1] == no_intersection())
    {
        // x86 hardware
        EXPECT_SOFT_EQ(6.0f, distances[0]);
    }
    else
    {
        // Apple Silicon
        EXPECT_SOFT_EQ(1e-7f, distances[0]);
        EXPECT_SOFT_EQ(6.0f, distances[1]);
    }

Nov 12 '25 14:11 sethrj

@osanstrong @sethrj I think compiling with CELERITAS_REAL_TYPE=float is going to be the best bet. Your tolerances are going to have to be loose enough that the code passes using float as the real_type, and I would think that would require looser tolerances than any discrepancy caused by differences of instruction set or optimizations flags.

Nov 14 '25 15:11 elliottbiondo

Cool! Compiling now with these: scripts/cmake-presets/OwenS360.json:

{
  "version": 3,
  "cmakeMinimumRequired": {"major": 3, "minor": 21, "patch": 0},
  "configurePresets": [
    {
      "name": "dev",
      "displayName": "dev: default development options",
      "inherits": [".debug", "default"],
      "binaryDir": "${sourceDir}/build",
      "cacheVariables": {
        "CMAKE_EXPORT_COMPILE_COMMANDS": {"type": "BOOL", "value": "ON"},
        "CMAKE_INSTALL_PREFIX": "${sourceDir}/install",
        "CMAKE_BUILD_TYPE": {"type": "STRING", "value": "RelWithDebInfo"},
        "CELERITAS_REAL_TYPE": "float",
        "CELERITAS_USE_VecGeom": {"type": "BOOL", "value": "OFF"}
      }
    },
   ...

scripts/build.sh:

...
# Configure, build, and test
log info "Configuring with verbosity"
cmake --preset=${CMAKE_PRESET} --log-level=VERBOSE "$@"
log info "Building"
if cmake --build --preset=${CMAKE_PRESET} -O3 -march=native; then
...

Also, when using the build script, is there a recommended way to pass arguments to the build call of cmake? It feels like I'm missing a better way than either modifying the script to insert -O3 -march=native (what I did just now to check) or manually doing each step

Update: these did make 3 of the tests fail, i'll let you know how it goes

Nov 15 '25 01:11 osanstrong

So:

Reducing build precision to float from double does indeed cause some of unit tests right on the edge of having a real root to fail, and I could probably change these to not be right on the edge (as suggested), since I guess that's not really the focus of a unit test
BUT, now that I check back on the github CI test that failed: that test was a surface root test. All that happens for a surface root is that it found a root slightly off from zero, which I can't reliably just move away from (unless there's a deeper connection I'm missing). Having a zero root is the point of the test. I'd just be randomly picking a different polynomial to test and hoping it doesn't run into the same precision issue.
Is this something we should actually design around? Should the solver itself have a tolerance for what counts as zero or negative root? e.g. from

/*!
 * Attempt to put a value into the given list at given index, returning where
 * to place the next item.
 *
 * If the given value is no_intersection() or is not positive, does not place
 * the root, and returns the same index for the next one.
 */
CELER_FUNCTION int FerrariSolver::place_root(Intersections& roots,
                                             real_type new_root,
                                             int free_index)
{
    if (!(new_root == no_intersection() || new_root <= 0))
    {
        roots[free_index] = new_root;
        free_index += 1;
    }
    return free_index;
}

to

CELER_FUNCTION int FerrariSolver::place_root(Intersections& roots,
                                             real_type new_root,
                                             int free_index)
{
    if (!(new_root == no_intersection() || new_root <= 0 || _soft_zero(new_root)))
    {
        roots[free_index] = new_root;
        free_index += 1;
    }
    return free_index;
}

Or is that something where the toroid code or broader program would just keep a tolerance for "this intersection was too close to the surface"

It then also occurs to me that surface tests aren't unit tests in this case; the surface tests go down the same paths as the others, only introducing numerical precision errors like this. It might just not make sense to have them as unit tests, in contrast to the quadratic solver tests:
The quadratic solver does have a specific code path for surface solutions, because the non-zero roots of ax² + bx = 0 are just the roots of ax + b = 0:

/*!
 * Solve degenerate case (known to be "on" surface).
 *
 * Since x = 0 is a root, then c = 0 and x = -b is the other root. This will be
 * inaccurate if a particle is logically on a surface but not physically on it.
 */
CELER_FUNCTION auto QuadraticSolver::operator()() const -> Intersections
{
    Intersections result{-2 * hba_, no_intersection()};

    if (result[0] <= 0)
    {
        result[0] = no_intersection();
    }

    return result;
}

Should I just discard the surface tests as being stability tests instead of unit tests, or do we think it could be worth it for the solver to actually include a special case for e = 0 .: solve_quartic(ax⁴ + bx³ + cx² + dx = 0) -> solve_cubic(ax³ + bx² + cx + d = 0)? If I rewrite the cubic solver already in FerrariSolver to be the full Cardano method (instead of the short-circuited version that only finds the dominant root), and rerun the Ferrari-Cardano numerical stability tests on the Python prototype for the analysis, that kills two birds with one stone:
- Surface Tests: the surface tests, which are right now just stability tests, are now explicitly unit tests because this would mean there's a corresponding code path to test
- Reproducibility: the version of the Ferrari-Cardano method is more accurate to the original textbook definition, instead of NKrvavrica's short-circuit of Cardano that only returns largest root (which actually shoot I forgot about referencing that; for the full published transaction I think I still need to add an aside to the Ferrari-Cardano explanation to also cite his specific implementation, or at least if I keep the short-circuited version)

TL;DR: I think I have three options:

For either the surface cases or for all cases, include a tolerance so roots really close to zero are discarded
Replace the short-circuited Cardano with the full Cardano method, and if e==0 (or manually specified to be on-surface), just solve ax³+bx²+cx+d=0 w/ Cardano method (the closest analogy to what QuadraticSolver does)
Do nothing for now; treat this as a numerical stability issue and pick a different polynomial that doesn't happen to fail the CI unit tests. In fact possibly remove the surface tests, because in hindsight these aren't actually testing a different code path.

I think 2 probably requires rerunning the python analysis (which shouldn't take too long anyways), and 1 might get away without it (because this is just picking a tolerance to ignore bad results; the python analysis is how quickly run into those bad results in the first place)

Nov 15 '25 07:11 osanstrong

Hi @osanstrong , you've got some thorough thoughts on a deep problem.

The "broader program" never keeps track of whether intersections are too small, but as you know, we do keep track of whether we're logically on the surface of the toroid (i.e., maximum of three other roots, even if there's a slight numerical difference between the position and the actual surface of the toroid), as you have for the on-surface case.
We do not have any way to track whether we're on a "double root" (e.g., tangent to the toroid) so can at best do a special case for cubic but not quadratic.
We do prevent zero distances because that will lead to infinite loops, but that's covered by your place_root function.
It is allowable to have small distances so I don't think we want to add a soft-zero to place_root.

Should I just discard the surface tests as being stability tests instead of unit tests, or do we think it could be worth it for the solver to actually include a special case for e = 0 .: solve_quartic(ax⁴ + bx³ + cx² + dx = 0) -> solve_cubic(ax³ + bx² + cx + d = 0

I think it is very important at this level to check and document the behavior of the solver when we're logically on a surface, but we don't have to necessarily test for the case "e = 0". For physical points near the surface we're going to assume there's always some fuzziness about every one of those coefficients, so from the bigger picture we don't really need to have a consistent behavior for e = 0. (And there's always going to be a sharp discontinuity in the solver behavior, whether it's at exactly e = 0, or exactly e = ±ε.)

In other words, a special case for the cubic solve is indeed probably the way to go. You would add a Intersections operator()(Real4 const&);, and instead of the on_surface == SurfaceState::on setting e=0 and running the full solver, it would just return (*this)(Real4{a,b,c,d}).

Nov 15 '25 13:11 sethrj

When I was reviewing the Cardano method and other options for cubic solvers, I found a more comprehensive description of what I think 1728 systems (cubic implementation I referenced through NKvravica) used, which I think is pretty similar what Algorithm 1010 used. I'm just kicking myself for not looking closely enough at the cubics when I first did lit review over the summer, because I should have just used the comprehensive one from the start.

The Numerical Recipes method (Press et. al, 2007), seemingly used by 1728 systems but poorly documented, and definitely used by Algorithm 1010 and the "PRACTICAL METHOD" I found posted this month, is a combination of the Cardano method and the trigonometric method, as opposed to the straight Cardano method. Specifically, it avoids using complex numbers by calculating the three real roots case using trigonometric functions, which is why it used trig functions and not a complex library.

For one thing, this makes our previous description of "Ferrari-Cardano method" incorrect, because it's definitely not the simple Cardano method being plugged in here. It's also a little annoying because it means the cubic solver I used in Ferrari vs Algorithm 1010 was slightly different but could have just been the same if I was paying attention. I'm going to start by implementing the full numerical recipes method and changing the description to clarify that it's something different than the original Cardano method (Ferrari-NR?), and this is going to be something I have to change for the ANS submission.

In the meantime, since we still only need real roots, the main decision is whether to use complex numbers (Cardano) or to use trigonometric functions (the Numeric Recipes optimization in Algorithm 1010, and similar to the Ferrari implementation I referenced).

Is it worth comparing them / Is there one of these two (complex math or trigonometric functions) that's a huge issue for GPU implementations? I've experienced trig functions being an issue for GPU performance when I was going amateur OpenGL stuff, but it a) seems like trig functions are used elsewhere in celeritas and b) this isn't that much extra code complexity, which seemed like the bigger concern bc of registry usage. Essentially:

# Numerical Recipes, described to be more robust but uses trig functions
    if h <= 0:
        j = math.sqrt(-f) 
        k = math.acos(-0.5*g / (j*j*j))
        m = math.cos(third*k)
        n = sqr3 * math.sin(third*k)
        r1 = 2*j*m - a13
        r2 = -j * (m + n) - a13
        r3 = -j * (m - n) - a13 # Is this enough temporary variables to warrant analyzing how necessary it is? Vs. just using it 
        return r1, r2, r3

    else: 
        sqrt_h = cmath.sqrt(h)
        S = cubic_root(-0.5*g + sqrt_h)
        U = cubic_root(-0.5*g - sqrt_h)
        S_plus_U = S + U
        r1 = S_plus_U - a13
        return r1

# Cardano, the original 
        sqrt_h = cmath.sqrt(h)
        S = cubic_root(-0.5*g + sqrt_h)
        U = cubic_root(-0.5*g - sqrt_h)
        S_plus_U = S + U
        r1 = S_plus_U - a13
        if (h > 0): # Only one real root
             return r1
        S_minus_U = S - U
        r2 = -0.5*S_plus_U - a13 + S_minus_U*sqr3*0.5j # These cancel out to real numbers
        r3 = -0.5*S_plus_U - a13 - S_minus_U*sqr3*0.5j # "
        return real(r1), real(r2), real(r3)

So I'm curious for opinions if it's worth testing if we need the precision improvements, or if that'd just be bikeshedding since this isn't that big of a difference compared to the quartic algorithm complexity.

Nov 29 '25 02:11 osanstrong

There's also the new Practical Method, which isn't published in a journal as near as I can tell, which seems to be about the same complexity as in Numerical Recipes with one extra if statement:

Nov 29 '25 03:11 osanstrong

Update I think I really just want to switch entirely to Practical Algorithms, and redo the precision loss analysis if necessary, because not properly lit-reviewing the cubic solver used by the Ferrari implementation I referenced has been a never ending source of citation issues.

I.e. switch to using Numerical Recipes' algorithm from the one I copied over the summer, which looked very similar but the citation just goes to a poorly documented calculator website

And then from there if necessary I can compare the NR cubic (which would be shared between ferrari and Alg1010) to the original Cardano or to the slightly different/newer Practical Method.

BibLaTeX citation for the Numerical Recipes cubic algorithm, if it looks small enough to not analyze thoroughly:

@incollection{press_56_2007,
	location = {Cambridge},
	edition = {2. ed., repr., corr. to software version 2.10},
	title = {5.6 Quadratic and Cubic Equations},
	isbn = {978-0-521-43064-7},
	series = {Fortran numerical recipes},
	number = {1},
	booktitle = {Numerical recipes in Fortran 77: the art of scientific computing},
	publisher = {Cambridge University Press},
	editor = {Press, William H.},
	date = {2007},
	langid = {english},
	file = {PDF:C\:\\Users\\Owen\\Zotero\\storage\\CB6KJZAC\\Press - 2007 - Numerical recipes in Fortran 77 the art of scientific computing.pdf:application/pdf},
}

Nov 29 '25 03:11 osanstrong