WarpX icon indicating copy to clipboard operation
WarpX copied to clipboard

Apparent particle density spikes with SP particles, DP fields

Open PhilMiller opened this issue 2 years ago • 12 comments

Configured with -DWarpX_PRECISION=DOUBLE -DAMReX_PARTICLES_PRECISION=SINGLE, we're seeing some very anomalous reported particle densities at particular coordinates compared to default double precision:

Ar_ion_xavg_difference_100 0_ns

Those are at z=325, 650, 1300

The domain is 0.0025^2, with nx=nz=1664. The domain starts with zero offsets, no moving window.

With those parameters, those coordinates are (some of the) points at which the single and double precision representation of iz*nz are exactly equal. We observe similar spikes across the x axis at positions ±650 cells from the origin.

Here's the bit of code I wrote to recognize the representation coincidence:

#include <iostream>
#include <cstdio>

using namespace std;

int main(int argc, char** argv)
{
    double dz = 0.0025/1664.0;
    float sdz = dz;
    double diff_dz = dz - sdz;

    printf("Double %a, Mixed %a, Diff %a %e\n\n", dz, sdz, diff_dz, diff_dz);

    printf("Index\tDouble\tMixed\tDiff\n");
    for (int i = 0; i < 1664; ++i) {
        printf("%d\t%e\t%e\t%e\n", i, i*dz, i*sdz, i*dz - i*sdz);
    }

    return 0;
}

When I wrote that, I was expecting those to be points at which the representation difference was greatest, rather than zero.

PhilMiller avatar Jul 05 '22 20:07 PhilMiller

To be more precise, 325 is the first such point with identical SP/DP value, and then 2*325 and 4*325, but we didn't observe errors at 3*325=975 or 5*325=1625 which also have identical SP/DP values

PhilMiller avatar Jul 05 '22 20:07 PhilMiller

Ok, here's some zoomed-in scatter plots of the points around z=1300

image image

Given that the anomaly extends several points away from z=1300, I doubt that it's purely a diagnostic/analysis artifact. There's a similar but smaller spread around z=325,650 as well. My suspicion is that charge deposition is misbehaving at these points, and there's a physical concentration that's resulting. I've got a rough sense that the adjacent 'excess' and 'deficit' points should balance out to about the average density difference between the two runs

PhilMiller avatar Jul 05 '22 21:07 PhilMiller

These were both run on CUDA GPU with 1 grid covering the entire domain, so this isn't an issue related to Redistribute()

PhilMiller avatar Jul 05 '22 21:07 PhilMiller

A note of discussion from @KZhu-ME and @peterscherpelz - the same anomaly appears for electron density as well, but to a lesser extent.

The effect takes some time from the beginning of the run to appear, and so per @peterscherpelz

If it's not as prominent for electrons that would be consistent with a gradual effect over time, because the slower movement of the ions implies they wouldn't move away to smooth out the spike as quickly.

PhilMiller avatar Jul 06 '22 03:07 PhilMiller

@ax3l Any idea how to move forward on this?

RemiLehe avatar Jul 11 '22 22:07 RemiLehe

May be a consequence of -ffast-math on GPU

PhilMiller avatar Jul 12 '22 16:07 PhilMiller

Does the same issue occur with AMReX_CUDA_FASTMATH=OFF?

ax3l avatar Jul 12 '22 16:07 ax3l

Thanks; we'll check with fast math off next - it'll probably be next week on our end that we can check it.

peterscherpelz avatar Jul 12 '22 22:07 peterscherpelz

Sorry for the late update. Turning fastmath off did not fix the issue unfortunately.

Fastmath_off_Ar_ion_xavg_error_50 0_ns

KZhu-ME avatar Jul 27 '22 20:07 KZhu-ME

It looks like that changed the position of the spikes, though. What indices are those happening at? Can you see anything interesting about the output of my little analysis program at those positions?

PhilMiller avatar Jul 27 '22 23:07 PhilMiller

Kevin told me that he did that run in a slightly lower-resolution domain, 1472^2, to get results a bit faster. The spikes consistently occur where the zero differences appear at that resolution, based on my utility above.

PhilMiller avatar Aug 01 '22 18:08 PhilMiller

Ok, now that #3397 has been addressed, I'm back to looking at this. Here's a more recent plot of the effect

image

That shows its absence in double precision, and its growth over time

PhilMiller avatar Sep 19 '22 22:09 PhilMiller