deepxde icon indicating copy to clipboard operation
deepxde copied to clipboard

Solving Inverse problem with non constant coefficient

Open Wolpes11 opened this issue 3 years ago • 24 comments

Dear Dr. lulu,

I have questions regarding DeepXDE. I would like to solve an inverse problem for a PDE with coefficients which are functions of space and time A(x,t) and B(x,t). Is this possible? All the examples that I found use constant (scalar) coefficients.

Do you have an example for this situation, please? Thank you in advance for your time!

Wolpes11 avatar Feb 17 '22 18:02 Wolpes11

Yes, it is possible, you can use PFNN, as mentioned in the example "elliptic_inverse_field.py".

LanPeng-94 avatar Feb 21 '22 01:02 LanPeng-94

Thank you so much! It should work for my problem.

Instead of starting a new thread, I have just another quick question. How do I have to define boundary/initial condition simply based on data? I have a grid of measurements at different position x and time t, I just want to define as boundary conditions values at x=x_start and x=x_end and as initial condition measurements at t=0.

Wolpes11 avatar Feb 21 '22 13:02 Wolpes11

Use PointSetBC https://deepxde.readthedocs.io/en/latest/modules/deepxde.icbc.html#deepxde.icbc.boundary_conditions.PointSetBC

lululxvi avatar Feb 24 '22 02:02 lululxvi

Thank you Dr. Lulu for your reply and thank you for your really great work!

I've just the last question to bother you. I have 157,511,520 data points that in single precision are ~630 MB and I'm working with a GPU with 80GB of memory. But the code crash at epoch=0 for "Out Of Memory Error". What could it be the problem?

Wolpes11 avatar Feb 24 '22 16:02 Wolpes11

Is CPU out of memory or GPU?

lululxvi avatar Mar 02 '22 18:03 lululxvi

I think is GPU. I'm working with 2TB of memory on CPU and it can load the data points correctly. The model compiles correctly and the code prints the loss at epoch=0, but then it crashes because it tries to allocate many other tensors of size [2xN_points, N_neurons_per_layer], if I see correctly.

Thanks again for your time.

Wolpes11 avatar Mar 02 '22 18:03 Wolpes11

Then the solutions I can think is either using a smaller dataset, or use mini-batch.

lululxvi avatar Mar 07 '22 17:03 lululxvi

I tried using ResidualResampler but it doesn't work. I'm not even sure I'm using it correctly. Here it is the part of the code where I'm implementing the model.

geom = dde.geometry.Interval(1, 2)
timedomain = dde.geometry.TimeDomain(0, 1)
geomtime = dde.geometry.GeometryXTime(geom, timedomain)

observe_x, y = gen_traindata()

BC = dde.PointSetBC(observe_x[(observe_x[:,0]==1.)|(observe_x[:,0]==2)], y[(observe_x[:,0]==1)|(observe_x[:,0]==2)], component=0)
IC = dde.PointSetBC(observe_x[observe_x[:,1]==0], y[observe_x[:,1]==0], component=0)
TP = observe_x[(observe_x[:,0]!=1)&(observe_x[:,0]!=2)]

data = dde.data.TimePDE(
    geomtime,
    pde,
    [BC, IC],
    num_domain=0,
    num_boundary=0,
    num_initial=0,
    anchors=TP
)
net = dde.maps.PFNN([2, [30, 30, 30], [20, 20, 20], [20, 10, 10], [20, 1, 1], [20, 1, 1], 3], "tanh", "Glorot uniform")
model = dde.Model(data, net)
model.compile("adam",lr=0.002)
resampler = dde.callbacks.PDEResidualResampler(period=100)
losshistory1, train_state1 = model.train(epochs=10000, callbacks=[resampler])

Thank you!

Wolpes11 avatar Mar 08 '22 17:03 Wolpes11

PDEResidualResampler only works for the points sampled by DeepXDE, not for points provided by users via anchors. You domain seems small, and maybe a small dataset is enough.

lululxvi avatar Mar 11 '22 22:03 lululxvi

Thank you so much for your help. I tried also with batch_size, instead of PDEResidualSampler but the problem remain the same. The domain is so small because I normalized both space and time. Otherwise my domain spans (500m,550m) in space and more than 20 days in time with a resolution of 1s.

Wolpes11 avatar Mar 14 '22 13:03 Wolpes11

I tried also with batch_size, instead of PDEResidualSampler but the problem remain the same.

Do you mean using a small training data and you still have the OOM error? This seems strange.

lululxvi avatar Mar 18 '22 01:03 lululxvi

I tried setting a batch size when I train the model: losshistory1, train_state1 = model.train(epochs=10000, batch_size=1) But it gives me the same OOM error.

The problem is that the maximum size of training set that I can fit in memory does not allow to capture all the features of the data (such as long period oscillations). Thank you!

Wolpes11 avatar Mar 18 '22 08:03 Wolpes11

@Wolpes11 I also met a similar problem...I have a lot of experimental data and I can not fit it all into the memory...For the normal ANN in Tensorflow.Keras, we can easily use the argument "batch_size" for the mini-batch to deal with OOM...However, for PINN in deepXDE, I think the implementation of the mini-batch for the PointSetBC has not been done yet....if you dig into the source code of the function "train()", there is not thing implemented for the argument "batch_size" for PINN...

haison19952013 avatar Mar 23 '22 09:03 haison19952013

Thank you @haison19952013 ! Yes, I had a look at the source code. However, what makes me puzzled is that the entire data set actually fits in memory (it is more or less 600MB) and then the OOM error arises at the beginning of the training phase. The architecture of the neural net should be fixed, independently of the number of samples, 2 inputs (space and time) and 1 output, right?!

Wolpes11 avatar Mar 23 '22 09:03 Wolpes11

Thank you @haison19952013 ! Yes, I had a look at the source code. However, what makes me puzzled is that the entire data set actually fits in memory (it is more or less 600MB) and then the OOM error arises at the beginning of the training phase. The architecture of the neural net should be fixed, independently of the number of samples, 2 inputs (space and time) and 1 output, right?!

Yes, the architecture of the neural net is fixed and I think there is no problem with the architecture. In my case, after the epoch "0", the OOM will occur. For your case, I think your data can fit into your system. However, after epoch "0", the model will have to calculate and store a lot of information, especially the gradient information for the PDE of all data in one time. I think this can be a main reason leading to the OOM. What do you think ?

P/s: If you really want to use the mini-batch right now, I think you can consider the paper "Hidden Fluid Mechanics" which is provided with source code. The author also use a lot of data and apply the mini-batch for training

haison19952013 avatar Mar 23 '22 15:03 haison19952013

batch_size in model.train doesn't work for PINN, because in PINN there are different types of training points, such as PDE points, BC points, IC points etc. So it is not clear here what batch_size really means.

The current DeepXDE version supports mini-batch of PDE residual points via PDEResidualSampler, but the training points provided by anchors will not be used as mini-batch. In order to also do a mini-batch for anchors, you need to modify the source code of the following line: https://github.com/lululxvi/deepxde/blob/4714a1f4268489c7d2e50302ddefd54a8aa5defb/deepxde/data/pde.py#L237 Instead of using all the points in self.anchors, you can simply randomly pick a subset of self.anchors. Then PDEResidualSampler will also work for anchors points.

lululxvi avatar Mar 23 '22 20:03 lululxvi

Thank you @haison19952013 ! Yes, I had a look at the source code. However, what makes me puzzled is that the entire data set actually fits in memory (it is more or less 600MB) and then the OOM error arises at the beginning of the training phase. The architecture of the neural net should be fixed, independently of the number of samples, 2 inputs (space and time) and 1 output, right?!

Yes, the architecture of the neural net is fixed and I think there is no problem with the architecture. In my case, after the epoch "0", the OOM will occur. For your case, I think your data can fit into your system. However, after epoch "0", the model will have to calculate and store a lot of information, especially the gradient information for the PDE of all data in one time. I think this can be a main reason leading to the OOM. What do you think ?

P/s: If you really want to use the mini-batch right now, I think you can consider the paper "Hidden Fluid Mechanics" which is provided with source code. The author also use a lot of data and apply the mini-batch for training

Yes, I agree. I think that the main problem is with the temporal gradient which concerns the entire data set. Thank you for your suggestion. I will have a look at the paper "Hidden Fluid Mechanics", even if I found that the package DeepXDE is easier to adapt for different PDE and applications.

Wolpes11 avatar Mar 25 '22 13:03 Wolpes11

Dear @lululxvi, I modified the source code as you suggested.

idx = np.random.randint(0, len(self.anchors)-1, 1000)
X = np.vstack((self.anchors[idx,:], X))

But I get the same OOM error. Am I doing something wrong? Thank you!

Wolpes11 avatar Mar 30 '22 13:03 Wolpes11

It looks OK. You may check the size of X.

lululxvi avatar Apr 04 '22 23:04 lululxvi

Yes, I have already check it. It is correctly (1000, 2). It seems that the point where the code crashes due to OOM error is not this one.

Thank you!

Wolpes11 avatar Apr 05 '22 12:04 Wolpes11

Hi Dr @lululxvi, I tried different codes with the dataset I'm using and the batch "trick" you suggested works fine. Do you have any clue why in this case the randomization of the training points does not work?

Thanks in advance!

Wolpes11 avatar Jun 04 '22 15:06 Wolpes11

Hi Dr @lululxvi, I tried different codes with the dataset I'm using and the batch "trick" you suggested works fine. Do you have any clue why in this case the randomization of the training points does not work?

What do you mean by "the batch trick" and "randomization of the training points"?

lululxvi avatar Jun 06 '22 14:06 lululxvi

Hi Dr @lululxvi, I tried different codes with the dataset I'm using and the batch "trick" you suggested works fine. Do you have any clue why in this case the randomization of the training points does not work?

What do you mean by "the batch trick" and "randomization of the training points"?

The following modification of your source code, as you previously suggested:

idx = np.random.randint(0, len(self.anchors)-1, 1000)
X = np.vstack((self.anchors[idx,:], X))

Thank you!

Wolpes11 avatar Jun 06 '22 15:06 Wolpes11

You may directly check what is passed as the network input during training, and then figure out step by step what goes wrong in your code.

https://github.com/lululxvi/deepxde/blob/26e5e49987331420879e3cf7e70e3eb379593704/deepxde/model.py#L553

lululxvi avatar Jun 09 '22 23:06 lululxvi