transformer Calibration tests

Hello, I hope you're doing well 😄

I was following your paper and experimenting and I thought about an additional way to evaluate the calibration method and I wanted to see if you had any insight about it.

I use the building parameters found via the calibration step and plug them into the initial synthetic data generator and then compare the timeseries generated using these calibrated parameters with the timeseries of the ground truth parameters. Although the calibration step fits really well the timeseries, when I plug the calibrated parameters into the initial simulator that was used to generate the data, the timeseries do not match as well as when we fit the parameters using the calibration step.

Have you experimented with that methodology to evaluate the calibration? Do you have any insights concerning it? Thank you!

Mar 31 '25 09:03 fremk

Hi,

If I understand correctly, you used a metamodel to run the calibration step which yielded some physical parameters for the building $\theta_{\texttt{calibration}}$. With these parameters, the metamodel fits the observed data $y$ well. However, when you plug $\theta_{\texttt{calibration}}$ in the physical simulator (TRNSYS, EnergyPlus, etc.), you get a different simulation, which does not fit the observed data:

$$\mathcal{L}(f_{\texttt{metamodel}}(\theta_{\texttt{calibration}}), y) << \mathcal{L}(f_{\texttt{physical simulator}}(\theta_{\texttt{calibration}}), y)$$

If this is the case, I believe this is not so much about calibration that fitting the numerical simulator. The calibration process should be indifferent to the simulation model used, be it TRNSYS, Energyplus or a metamodel of these physical simulator. However, if at some point the metamodel no longer fits the physical simulator, this means that either:

the synthetic dataset generated from the physical simulator does not cover the entire parameter space explored during calibration ;
the metamodel is inaccurate in the parameter space explored during calibration.

The first option is easily checked (and fixed if necessary), otherwise it's probably the second option. We did run into the same problem during our research, and simply solved it by improving the performance of the metamodel during the learning phase, either by adding more synthetic data to the training or more parameter to the metamodel.

What do you think ? Max

Apr 05 '25 10:04 maxjcohen

Hello Max,
Yes, this is exactly it!
For some cases, the calibrated parameters function decently enough with the physical simulator; others, not at all. But as you described, the loss of the metamodel is always far less than the loss with the physical simulator.

$L\left(f_{\text{metamodel}}\left(\theta_{\text{calibration}}\right), y\right) \ll L\left(f_{\text{physical simulator}}\left(\theta_{\text{calibration}}\right), y\right)$

I have already thought about both suggestions:

I limited the search space of the calibration process to exactly the space of the generated synthetic data.
Here's the tricky part. I thought that by improving the performance of the metamodel it would work out too. But for some reason, the metamodel that performs better, simply by training for more epochs for example (training and validation loss that is still decreasing), appears to perform worse when it comes to the calibration process as described! Meaning that if I take the best performing metamodel on the validation set, and launch a calibration process and use the found parameters with the physical simulator; it performs worse than the metamodel used with weights from early on in the training process. I already have a decent amount of data, around 30 000 randomly generated synthetic houses, each with timeseries spanning an entire year. I'm training the LSTM by windowing the timeseries and splitting the train/val sets randomly.

The use of the weights found on epoch 40, outperforms the ones found later on lets say on epoch 160 when it comes to this calibration process. So I guess it must be some kind of an overfitting issue, somehow?

The main, and only difference between both studies, is the type of data. I am working with synthetic consumer dwelling data, generated using OCHRE, instead of big commercial buildings, with around 17 house parameters. Some of which are categorical variables so with one hot encoding it becomes around 27 parameters. In addition to that, I am training the model to only predict the inside temperature given the power, outside data (temperature, humidity, irradiances..) and house parameters.

One last resort I was contemplating is a custom train/val split. Instead of randomly sampling the windowed timeseries to a 75/25 split, I choose 75% of the houses, before windowing the time series, as training and the remaining 25% as val. Let me know what you think. Thank you for your time! 😄 Karim

Apr 07 '25 09:04 fremk

One additional irrelevant question while we're at it... How did you solve the initial temperature problem? Predicting the starting temperature for the first couple of timesteps For me I experimented with many approaches one of which was adding an additional input variable that has the starting temperature value in it. It does not work perfectly so I was wondering if you had a specific approach to that problem 😅

Apr 10 '25 14:04 fremk

Hi @fremk,

The metamodel that performs better [...] appears to perform worse when it comes to the calibration process as described.

This makes sense to me: the metamodel which has not converged on the physical simulator's distribution - i.e. which is not well trained yet - is able to generate simulations which may not be physically possible, and the optimizer takes advantage of this flaw. In other words, the optimizer finds a calibration solution that better fits the data, but is not coherent with the physical simulator ; when you better train your metamodel, this solution disappears as the model no longer generates unrealistic solutions, and the calibration performances (slightly) drop.

How did you solve the initial temperature problem ?

There are only two solutions to this problem that I am aware of:

Burn in, i.e. generate a time series longer that what you need and remove the first few points. This leaves time for the model to forget about the initial conditions.
Directly set the initial state of the model if possible. For instance, if you use an autoregressive model that takes prediction of the previous timestep to predict the next, simply set the initial temperature at the first timestep.

Hope this helps, Max

May 12 '25 07:05 maxjcohen

Hey @maxjcohen,

I may have explained it badly but you're right the actual calibration performance slightly drops, meaning $L\left(f_{\text{metamodel}}\left(\theta_{\text{calibration}}\right), y\right)$ is higher BUT what I was actually referring to is the calibration loss calculated with the physical model so $L\left(f_{\text{physical simulator}}\left(\theta_{\text{calibration}}\right), y\right)$; surprisingly this performance also drops...

This is why it doesn't make sense to me. Normally, as you explained previously, the better fit metamodel - the one that has converged on the physical simulator's distribution - should yield a better calibration solution when it comes to re simulating using the physical model. But that was not what I observed; here is the MSE distribution for 1000 validation samples for the calibration process using $L\left(f_{\text{physical simulator}}\left(\theta_{\text{calibration}}\right), y\right)$ with the metamodel in early training that has NOT converged:

Compared to the distribution using the metamodel that has converged

And of course both of them are still 10x time worse than the $L\left(f_{\text{metamodel}}\left(\theta_{\text{calibration}}\right), y\right)$ losses:

with the metamodel in early training that has NOT converged:

with the metamodel that has converged:

Long story short, the problem remains that $L\left(f_{\text{metamodel}}\left(\theta_{\text{calibration}}\right), y\right) \ll L\left(f_{\text{physical simulator}}\left(\theta_{\text{calibration}}\right), y\right)$; and the fact that increasing the performance of the metamodel on the training set gives opposite results than expected makes me wonder what the solution could be for my case. Anyway I am just sharing my findings just to keep you in the loop, will keep you updated if I figure out how to solve the issue 😄

There are only two solutions to this [...] simply set the initial temperature at the first timestep.

Pretty much what I thought of too, sounds good👌

Thank you so much for your time you have been a tremendous help! Regards, Karim

May 12 '25 09:05 fremk