PINO
PINO copied to clipboard
FD-based gradient calculation seems incorrect for Burgers (with code to verify)
The issue
I am trying to understand how gradients are computed for Burgers, implemented by FDM_Burgers()
in train_utils/losses.py
, as pasted below:
def FDM_Burgers(u, v, D=1):
batchsize = u.size(0)
nt = u.size(1)
nx = u.size(2)
u = u.reshape(batchsize, nt, nx)
dt = D / (nt-1)
dx = D / (nx)
u_h = torch.fft.fft(u, dim=2)
# Wavenumbers in y-direction
k_max = nx//2
k_x = torch.cat((torch.arange(start=0, end=k_max, step=1, device=u.device),
torch.arange(start=-k_max, end=0, step=1, device=u.device)), 0).reshape(1,1,nx)
ux_h = 2j *np.pi*k_x*u_h
uxx_h = 2j *np.pi*k_x*ux_h
ux = torch.fft.irfft(ux_h[:, :, :k_max+1], dim=2, n=nx)
uxx = torch.fft.irfft(uxx_h[:, :, :k_max+1], dim=2, n=nx)
ut = (u[:, 2:, :] - u[:, :-2, :]) / (2 * dt)
Du = ut + (ux*u - v*uxx)[:,1:-1,:]
return Du, ut, ux, uxx
It is clear that you are using finite difference (FD) to compute $u_t$, ut = (u[:, 2:, :] - u[:, :-2, :]) / (2 * dt)
. For $u_x$, you are computing it in the Fourier space as described in the paper. However, it seems to me that what you are doing here (i.e., only one round of FFT, wavenumber multiplication, and IFFT) is insufficient; for example, the pointwise activation functions are not included at all. I do not understand why $u_x$ and $u_{xx}$ could be computed in such a simple way.
Benchmark with autograd
To understand this question, I check the results against autograd
. These are what I have done:
- extend the returns of
FDM_Burgers()
andPINO_loss()
intrain_utils/losses.py
to expose the gradient outputs; - compare FD and
autograd
results in the training methodtrain_2d_burger()
intrain_utils/train_2d.py
; - a minor fix in
train_burgers.py
to run debug on CPU.
For quick reference, this is my code for the FD vs autograd
comparison in step 2:
for x, y in train_loader:
# make x require grad
x.requires_grad = True
x, y = x.to(rank), y.to(rank)
out = model(x).reshape(y.shape)
data_loss = myloss(out, y)
####################
# BENCHMARK ut, ux #
####################
# results from FDM
loss_u, loss_f, Du, ut, ux, uxx = PINO_loss(out, x[:, 0, :, 0], v)
from torch.autograd import grad
g_AD = grad(out.sum(), x, create_graph=True)[0]
# from datasets.py
# Xs = torch.stack([Xs, gridx.repeat([n_sample, self.T, 1]),
# gridt.repeat([n_sample, 1, self.s])], dim=3)
ux_AD = g_AD[:, :, :, 1] # x coordinates -> second dim
ut_AD = g_AD[:, :, :, 2] # t coordinates -> third dim
print('Difference for ut')
print(ut_AD[0, 1:-1] - ut[0])
print('\n\nDifference for ux')
print(ux_AD[0] - ux[0])
assert False, 'Stop for debug'
If you replace the original source files with the attached three files and run
python3 train_burgers.py --config_path configs/pretrain/burgers-pretrain.yaml --mode train
you should be able to get some outputs similar to the following:
Difference for ut
tensor([[ 1.1273e-05, 7.0436e-06, 4.2944e-06, ..., 4.5442e-05,
3.7894e-05, 3.1451e-05],
[-1.2425e-05, -1.6239e-05, -1.7828e-05, ..., 9.3258e-06,
3.6445e-06, -5.4443e-07],
[-2.9822e-05, -3.1391e-05, -3.2599e-05, ..., -2.1410e-05,
-2.2994e-05, -2.5690e-05],
...,
[-3.7912e-05, -3.9732e-05, -4.0075e-05, ..., -2.7271e-05,
-3.0192e-05, -3.2881e-05],
[-3.5721e-05, -3.9391e-05, -4.0469e-05, ..., -2.0454e-06,
-7.5690e-06, -1.2858e-05],
[-2.4116e-05, -2.8111e-05, -3.0634e-05, ..., 3.3159e-05,
2.4668e-05, 1.7543e-05]], grad_fn=<SubBackward0>)
Difference for ux
tensor([[ 0.0319, -0.0136, 0.0091, ..., 0.0092, -0.0135, 0.0320],
[ 0.0319, -0.0136, 0.0091, ..., 0.0092, -0.0135, 0.0320],
[ 0.0319, -0.0136, 0.0091, ..., 0.0092, -0.0135, 0.0320],
...,
[ 0.0334, -0.0144, 0.0095, ..., 0.0097, -0.0142, 0.0336],
[ 0.0334, -0.0144, 0.0095, ..., 0.0097, -0.0142, 0.0336],
[ 0.0334, -0.0144, 0.0095, ..., 0.0096, -0.0142, 0.0335]],
grad_fn=<SubBackward0>)
As we can see, the differences between FD and autograd
for $u_t$ are quite small, as expected, which also imply that I am using autograd
correctly in train_2d_burger()
. However, the differences for $u_x$ are exceedingly large, which seems to support my doubt that FDM_Burgers()
is insufficent for $u_x$.
source (1).zip
Further I compared $u_x$ by three methods: finite difference, autograd
and FFT-based. The last one is adopted by your original code.
My modified code is attached.
Here are the output from my running:
Difference autograd vs FFT
tensor([[-0.0079, 0.0036, -0.0022, ..., -0.0022, 0.0035, -0.0080],
[-0.0079, 0.0036, -0.0022, ..., -0.0022, 0.0035, -0.0080],
[-0.0079, 0.0036, -0.0022, ..., -0.0022, 0.0035, -0.0080],
...,
[-0.0061, 0.0027, -0.0017, ..., -0.0017, 0.0027, -0.0062],
[-0.0061, 0.0027, -0.0017, ..., -0.0017, 0.0027, -0.0062],
[-0.0061, 0.0027, -0.0017, ..., -0.0017, 0.0027, -0.0062]],
grad_fn=<SubBackward0>)
Difference finite-diff vs FFT
tensor([[ 0.0035, -0.0022, 0.0016, ..., 0.0016, -0.0022, 0.0035],
[ 0.0035, -0.0022, 0.0016, ..., 0.0016, -0.0022, 0.0035],
[ 0.0035, -0.0022, 0.0016, ..., 0.0016, -0.0022, 0.0035],
...,
[ 0.0027, -0.0017, 0.0012, ..., 0.0012, -0.0017, 0.0027],
[ 0.0027, -0.0017, 0.0012, ..., 0.0012, -0.0017, 0.0027],
[ 0.0027, -0.0017, 0.0012, ..., 0.0012, -0.0017, 0.0027]],
grad_fn=<SubBackward0>)
Difference autograd vs finite-diff
tensor([[-4.8905e-05, -5.7416e-05, -6.7184e-05, ..., -1.7096e-05,
-1.1985e-05, -4.2657e-06],
[-4.8192e-05, -5.6226e-05, -6.6709e-05, ..., -1.6668e-05,
-1.1796e-05, -3.8399e-06],
[-4.7259e-05, -5.5532e-05, -6.5778e-05, ..., -1.6510e-05,
-1.1401e-05, -3.2073e-06],
...,
[ 7.5682e-06, 1.5970e-06, -6.3481e-06, ..., 1.7191e-05,
2.5546e-05, 3.8179e-05],
[ 8.3740e-06, 3.1168e-06, -5.0674e-06, ..., 1.7569e-05,
2.5923e-05, 3.7840e-05],
[ 8.7544e-06, 4.2115e-06, -3.4967e-06, ..., 1.7523e-05,
2.5875e-05, 3.8267e-05]], grad_fn=<SubBackward0>)
As you can see, autograd
and finite difference agree well, but the FFT-IFFT approach yields a big difference from the other two.