The issue

I am trying to understand how gradients are computed for Burgers, implemented by FDM_Burgers() in train_utils/losses.py, as pasted below:

def FDM_Burgers(u, v, D=1):
    batchsize = u.size(0)
    nt = u.size(1)
    nx = u.size(2)

    u = u.reshape(batchsize, nt, nx)
    dt = D / (nt-1)
    dx = D / (nx)

    u_h = torch.fft.fft(u, dim=2)
    # Wavenumbers in y-direction
    k_max = nx//2
    k_x = torch.cat((torch.arange(start=0, end=k_max, step=1, device=u.device),
                     torch.arange(start=-k_max, end=0, step=1, device=u.device)), 0).reshape(1,1,nx)
    ux_h = 2j *np.pi*k_x*u_h
    uxx_h = 2j *np.pi*k_x*ux_h
    ux = torch.fft.irfft(ux_h[:, :, :k_max+1], dim=2, n=nx)
    uxx = torch.fft.irfft(uxx_h[:, :, :k_max+1], dim=2, n=nx)
    ut = (u[:, 2:, :] - u[:, :-2, :]) / (2 * dt)
    Du = ut + (ux*u - v*uxx)[:,1:-1,:]
    return Du, ut, ux, uxx

It is clear that you are using finite difference (FD) to compute $u_t$, ut = (u[:, 2:, :] - u[:, :-2, :]) / (2 * dt). For $u_x$, you are computing it in the Fourier space as described in the paper. However, it seems to me that what you are doing here (i.e., only one round of FFT, wavenumber multiplication, and IFFT) is insufficient; for example, the pointwise activation functions are not included at all. I do not understand why $u_x$ and $u_{xx}$ could be computed in such a simple way.

Benchmark with `autograd`

To understand this question, I check the results against autograd. These are what I have done:

extend the returns of FDM_Burgers() and PINO_loss() in train_utils/losses.py to expose the gradient outputs;
compare FD and autograd results in the training method train_2d_burger() in train_utils/train_2d.py;
a minor fix in train_burgers.py to run debug on CPU.

For quick reference, this is my code for the FD vs autograd comparison in step 2:

for x, y in train_loader:
    # make x require grad
    x.requires_grad = True

    x, y = x.to(rank), y.to(rank)
    out = model(x).reshape(y.shape)
    data_loss = myloss(out, y)

    ####################
    # BENCHMARK ut, ux #
    ####################
    # results from FDM
    loss_u, loss_f, Du, ut, ux, uxx = PINO_loss(out, x[:, 0, :, 0], v)

    from torch.autograd import grad
    g_AD = grad(out.sum(), x, create_graph=True)[0]

    # from datasets.py
    # Xs = torch.stack([Xs, gridx.repeat([n_sample, self.T, 1]),
    #                   gridt.repeat([n_sample, 1, self.s])], dim=3)
    ux_AD = g_AD[:, :, :, 1]  # x coordinates -> second dim
    ut_AD = g_AD[:, :, :, 2]  # t coordinates -> third dim

    print('Difference for ut')
    print(ut_AD[0, 1:-1] - ut[0])

    print('\n\nDifference for ux')
    print(ux_AD[0] - ux[0])
    assert False, 'Stop for debug'

If you replace the original source files with the attached three files and run

python3 train_burgers.py --config_path configs/pretrain/burgers-pretrain.yaml --mode train

you should be able to get some outputs similar to the following:

Difference for ut
tensor([[ 1.1273e-05,  7.0436e-06,  4.2944e-06,  ...,  4.5442e-05,
          3.7894e-05,  3.1451e-05],
        [-1.2425e-05, -1.6239e-05, -1.7828e-05,  ...,  9.3258e-06,
          3.6445e-06, -5.4443e-07],
        [-2.9822e-05, -3.1391e-05, -3.2599e-05,  ..., -2.1410e-05,
         -2.2994e-05, -2.5690e-05],
        ...,
        [-3.7912e-05, -3.9732e-05, -4.0075e-05,  ..., -2.7271e-05,
         -3.0192e-05, -3.2881e-05],
        [-3.5721e-05, -3.9391e-05, -4.0469e-05,  ..., -2.0454e-06,
         -7.5690e-06, -1.2858e-05],
        [-2.4116e-05, -2.8111e-05, -3.0634e-05,  ...,  3.3159e-05,
          2.4668e-05,  1.7543e-05]], grad_fn=<SubBackward0>)

Difference for ux
tensor([[ 0.0319, -0.0136,  0.0091,  ...,  0.0092, -0.0135,  0.0320],
        [ 0.0319, -0.0136,  0.0091,  ...,  0.0092, -0.0135,  0.0320],
        [ 0.0319, -0.0136,  0.0091,  ...,  0.0092, -0.0135,  0.0320],
        ...,
        [ 0.0334, -0.0144,  0.0095,  ...,  0.0097, -0.0142,  0.0336],
        [ 0.0334, -0.0144,  0.0095,  ...,  0.0097, -0.0142,  0.0336],
        [ 0.0334, -0.0144,  0.0095,  ...,  0.0096, -0.0142,  0.0335]],
       grad_fn=<SubBackward0>)

As we can see, the differences between FD and autograd for $u_t$ are quite small, as expected, which also imply that I am using autograd correctly in train_2d_burger(). However, the differences for $u_x$ are exceedingly large, which seems to support my doubt that FDM_Burgers() is insufficent for $u_x$.

May 02 '23 12:05 kuangdai

source.zip

Please use these three source files to reproduce the benchmark results.

May 02 '23 12:05 kuangdai

source (1).zip Further I compared $u_x$ by three methods: finite difference, autograd and FFT-based. The last one is adopted by your original code.

My modified code is attached.

Here are the output from my running:

Difference autograd vs FFT
tensor([[-0.0079,  0.0036, -0.0022,  ..., -0.0022,  0.0035, -0.0080],
        [-0.0079,  0.0036, -0.0022,  ..., -0.0022,  0.0035, -0.0080],
        [-0.0079,  0.0036, -0.0022,  ..., -0.0022,  0.0035, -0.0080],
        ...,
        [-0.0061,  0.0027, -0.0017,  ..., -0.0017,  0.0027, -0.0062],
        [-0.0061,  0.0027, -0.0017,  ..., -0.0017,  0.0027, -0.0062],
        [-0.0061,  0.0027, -0.0017,  ..., -0.0017,  0.0027, -0.0062]],
       grad_fn=<SubBackward0>)


Difference finite-diff vs FFT
tensor([[ 0.0035, -0.0022,  0.0016,  ...,  0.0016, -0.0022,  0.0035],
        [ 0.0035, -0.0022,  0.0016,  ...,  0.0016, -0.0022,  0.0035],
        [ 0.0035, -0.0022,  0.0016,  ...,  0.0016, -0.0022,  0.0035],
        ...,
        [ 0.0027, -0.0017,  0.0012,  ...,  0.0012, -0.0017,  0.0027],
        [ 0.0027, -0.0017,  0.0012,  ...,  0.0012, -0.0017,  0.0027],
        [ 0.0027, -0.0017,  0.0012,  ...,  0.0012, -0.0017,  0.0027]],
       grad_fn=<SubBackward0>)


Difference autograd vs finite-diff
tensor([[-4.8905e-05, -5.7416e-05, -6.7184e-05,  ..., -1.7096e-05,
         -1.1985e-05, -4.2657e-06],
        [-4.8192e-05, -5.6226e-05, -6.6709e-05,  ..., -1.6668e-05,
         -1.1796e-05, -3.8399e-06],
        [-4.7259e-05, -5.5532e-05, -6.5778e-05,  ..., -1.6510e-05,
         -1.1401e-05, -3.2073e-06],
        ...,
        [ 7.5682e-06,  1.5970e-06, -6.3481e-06,  ...,  1.7191e-05,
          2.5546e-05,  3.8179e-05],
        [ 8.3740e-06,  3.1168e-06, -5.0674e-06,  ...,  1.7569e-05,
          2.5923e-05,  3.7840e-05],
        [ 8.7544e-06,  4.2115e-06, -3.4967e-06,  ...,  1.7523e-05,
          2.5875e-05,  3.8267e-05]], grad_fn=<SubBackward0>)

As you can see, autograd and finite difference agree well, but the FFT-IFFT approach yields a big difference from the other two.

May 03 '23 22:05 kuangdai

PINO
PINO copied to clipboard

FD-based gradient calculation seems incorrect for Burgers (with code to verify)

The issue

Benchmark with `autograd`

PINO PINO copied to clipboard

FD-based gradient calculation seems incorrect for Burgers (with code to verify)

The issue

Benchmark with autograd

PINO
PINO copied to clipboard

Benchmark with `autograd`