DESC icon indicating copy to clipboard operation
DESC copied to clipboard

Make mapping used for gyrokinetic sims public

Open unalmis opened this issue 3 weeks ago • 1 comments

According to Yigit, some groups have been using a private method I wrote in #1826 for gyrokinetic simulations. They are probably using it because it is 250x faster than the existing public method offered by DESC. This method is currently still a private method because the API is expected to change once the next improvement is released, which will likely yield another performance boost by at least $10^3$. It would also reduce memory. Therefore, this issue documents the need for someone to complete that task and make the method public.

partial sum zernike.pdf

Note that, the starting point for the spectral to real space transform is the two-dimensional set of fourier modes on each surface since you have already summed over radial modes to compute lambda on surface in other routines. So all you need to do is matrix multiply the toroidal modes with the precomputed vandermonde basis at the given zeta points. This yields the Fourier series in theta on every (rho, zeta) with cheap cost of the following number of fused multiply and adds

$$n_{\rho} \times n_{\theta, modes} \times n_{\zeta, modes} \times n_{\zeta}$$

and in particular only requires storing a precomputed Vandermonde matrix of size

$$n_{\zeta, modes} \times n_{\zeta}$$

This is far more performance than the full spectral to real space transform that is currently done which multiples all points by all modes then takes FFTs which has slow performance of $N^6 + n_{\rho} \times n_{\zeta} \times n_{\theta} \log n_{\theta} \gg N^4$. Therefore, this would be factor of $10^3$ speed up to move this routine to microsecond range and reduce memory storage for precomputed matrix and gradients that accumulate for AD.

The objectives that would benefit most from this are the ballooning stability objectives and the bounce integral objectives. The ballooning stability objectives still use the slow method which takes seconds which is significant, so implementing this would see a 10^6 performance boost in that subroutine and that would be measurable.

unalmis avatar Nov 16 '25 01:11 unalmis

Good idea. But I would be surprised if these folks are coupling flux tube geometry calculation with an actual gyrokinetic solver for speed.

It is important to note that even with the fastest gyrokinetic run, DESC flux tube calculation will comprise < 10% of the total time taken. Meaning that even if the DESC geometry calculation is instantaneous, the process of calculating flux tubes + solving the GK model will speed up by < 10%.

For a nonlinear simulaion, the overall speedup would be even lower (< 5%). For reference, a typical GX nonlinear run in a simplified limit (ITG w adiabatic electrons) takes > 10 min on a single A100 GPU.

rahulgaur104 avatar Nov 16 '25 02:11 rahulgaur104

Specifically this is referring to the partial_sum fxn and to the map_clebsch_coordinates which uses it, which is used currently inside a few objectives but is not publicly documented

dpanici avatar Nov 24 '25 19:11 dpanici