DESC nan reverse mode gradient when `FourierPlanarCurve` `normal` is parallel or anti-parallel to `Zaxis`

When I design TF and PF coils for the Solovev equilibrium everything works fine but when I add a helical coil to the coilset, I get a nan optimality.

Here's a minimal failing example

from desc.backend import jnp
from desc.examples import get
from scipy.constants import mu_0
from desc.grid import LinearGrid
from desc.plotting import *
from desc.magnetic_fields import ToroidalMagneticField,SumMagneticField,  VerticalMagneticField
from desc.coils import (
    CoilSet,
    FourierPlanarCoil,
    FourierRZCoil,
    MixedCoilSet,
)
from desc.objectives import (
        BoundaryError,
        FixBoundaryR,
        FixBoundaryZ,
        FixPressure,
        FixSumCoilCurrent,
        ObjectiveFunction, 
        ForceBalance,
        QuadraticFlux, 
        FixParameters,
        FixIota, 
        CoilSetMinDistance,
        CoilLength,
)

from desc.examples import get
from desc.optimize import Optimizer

eq = get("SOLOVEV")
grid_at_surf = LinearGrid(rho=1.0, M=eq.M_grid, N=eq.N_grid)
G_tot = 2*jnp.pi*eq.compute("G", grid=grid_at_surf)["G"][0] / mu_0
field = ToroidalMagneticField(R0=1, B0=mu_0*G_tot/2/jnp.pi) + VerticalMagneticField(mu_0*G_tot/2/jnp.pi)  #+ PoloidalMagneticField(mu_0*G_tot/2/jnp.pi,R0=1,iota=eq.compute("iota")["iota"].mean()) 

coil1 = FourierPlanarCoil(current=1e5,center=[0,0, 2.4],normal=[0,0,1],r_n=4.5)
coil2 = FourierPlanarCoil(current=1e5,center=[0,0,-2.4],normal=[0,0,1],r_n=4.5)
n_coils = 12
TFcoil=FourierPlanarCoil(current=G_tot/n_coils,center=[4,0,0],normal=[0,1,0],r_n=2.2)
TFcoil.change_resolution(N=0)
coil3 = CoilSet(TFcoil,NFP=n_coils)

minor_radius = eq.compute("a")["a"]
offset = 1.5 * minor_radius
num_coils = 1  # coils per half field period per coilset
zeta = jnp.linspace(0, 2*jnp.pi, 41)
grid = LinearGrid(rho=[0.0], zeta=zeta, NFP=eq.NFP)
data = eq.axis.compute(["x", "x_s"], grid=grid, basis="rpz")
helical_coils = []
helical_offset = 0.0
R0 = 4.
R = R0 + offset* jnp.cos(zeta - helical_offset)
Z = offset * jnp.sin(zeta - helical_offset)
data = jnp.vstack([R, zeta, Z]).T
coil4 = FourierRZCoil.from_values(
    current=2.1e2,
    coords=data,
    #N = 4,
    basis="rpz",  # we are giving the center and normal in cylindrical
)

field = MixedCoilSet(coil1, coil2, coil3, coil4)
#field = MixedCoilSet(coil1, coil2, coil3)

# first coil opt to find correct toroidal field needed
R_modes = eq.surface.R_basis.modes[jnp.max(jnp.abs(eq.surface.R_basis.modes), 1) > 0, :]
Z_modes = eq.surface.Z_basis.modes[jnp.max(jnp.abs(eq.surface.Z_basis.modes), 1) > 0, :]
bdry_constraints = (
    FixBoundaryR(eq=eq, modes=R_modes),
    FixBoundaryZ(eq=eq, modes=Z_modes),
)
coil_grid = LinearGrid(N=50)
constraints = (
    FixParameters(eq),
    FixParameters(field,[{"r_n":True,"center": [0, 1, 2],"normal":True}, {"r_n":True,"center": [0, 1, 2],"normal":True},{"normal":True}, {}]),
    #FixParameters(field,[{"r_n":True,"center": [0, 1, 2],"normal":True}, {"r_n":True,"center": [0, 1, 2],"normal":True},{"normal":True}]),
    FixSumCoilCurrent(field,indices=[False,False,True,False],target = G_tot/n_coils),
    #FixSumCoilCurrent(field,indices=[False,False,True],target = G_tot/n_coils),
)
objective = ObjectiveFunction((BoundaryError(eq,field),
                               CoilSetMinDistance(field,bounds=(0.2,100),grid=LinearGrid(N=100)),
                               #CoilLength(field, bounds=(0, 2 * np.pi * (0.5)), normalize_target=True, weight=100, grid=coil_grid,)
                              ))
optimizer = Optimizer("lsq-exact")
(eq,field_opt), out = optimizer.optimize([eq,field],
    objective,
    constraints + bdry_constraints,
    verbose=3,
    options={},copy=True,
     ftol=1e-4, xtol=1e-6,maxiter=5
)

The nans go away if you remove coil4 from the coilset. Another way to fix this(as suggested by @dpanici ) while including the helical coil is to add deriv_mode="fwd" to all the objectives.
What might be the reason for this?

Dec 11 '24 18:12 rahulgaur104

More minimal example (the nans going away if coil4 is removed is likely due to the deriv_mode changing)


from desc.magnetic_fields import ToroidalMagneticField
from desc.coils import (
    FourierPlanarCoil,
)
from desc.objectives import (
        BoundaryError,
        ObjectiveFunction, 
)

from desc.examples import get
from desc.optimize import Optimizer

eq = get("SOLOVEV")
coil1 = FourierPlanarCoil(current=1e5,center=[0,0, 2.4],normal=[0,0,1],r_n=4.5)
field = ToroidalMagneticField(1,1) # no error if a TF is used for field
field = coil1 # error if a single coil is used (or any coilset)
objective = ObjectiveFunction((BoundaryError(eq,field, deriv_mode="rev"), # error only when deriv_mode is rev
                              ))
optimizer = Optimizer("lsq-exact")
(eq_opt,field_opt), out = optimizer.optimize([eq,field],
    objective,
     constraints=(),
    verbose=3,
    options={},copy=True,
     ftol=1e-4, xtol=1e-6,maxiter=2
)

yields nan in optimality, so jacobian of BoundaryError has a nan in reverse mode, but only when a coil is in the field...

Dec 11 '24 19:12 dpanici

More data: the failure only occurs when there is a FourierPlanarCoil(normal=[0, 0, X]), so when the normal is in the z direction. Fourier planar coils with other normals work fine, as seem to work other coil classes (like if you do FourierPlanarCoil( normal=[0, 0, X]).to_SplineXYZ() that works fine.

So seems to somehow be an issue with the reverse mode gradient going through the rotations etc for FourierPlanarCoil when the normal is in the z axis

Dec 11 '24 19:12 dpanici

The failure also affects the rev mode derivative of any objective where coil.compute_magnetic_field is called (so quadratic flux etc)

Dec 11 '24 19:12 dpanici

I think its because we measure rotations relative to the Z axis, and when the normal is also along z then axis is [0,0,0]

https://github.com/PlasmaControl/DESC/blob/273515b231a2b4d187961a06abcdb54325e369ed/desc/compute/_curve.py#L209-L213

Dec 11 '24 19:12 f0uriest

angle = jnp.arccos(dot(Zaxis, safenormalize(normal))) when the normal is aligned with Zaxis, the argument here is 1, and the derivative of arccos is undefined (infinite) at 1. Though I am not sure why in fwd mode this works ok and not for reverse mode?

Dec 11 '24 19:12 dpanici

Ah wait Rory beat me, but why in fwd does this seem to work fine?

Dec 11 '24 19:12 dpanici

in rotation_matrix some invalid values get overwritten, which would make it fine in forward mode but not reverse. So we may just need to move some of that masking earlier so that invalid values for angle don't get created in the first place

Dec 11 '24 19:12 f0uriest

Instead of getting the rotation matrix from axis and angle might be easier just to get it directly from the 2 normals: https://math.stackexchange.com/questions/180418/calculate-rotation-matrix-to-align-vector-a-to-vector-b-in-3d

Dec 11 '24 19:12 f0uriest

Don't know if related but nice trick https://docs.kidger.site/equinox/api/debug/#common-sources-of-nans

Apr 07 '25 17:04 YigitElma