Elynn Wu
Elynn Wu
> Why would we want to source `/global/cfs/cdirs/e3sm/eamxx-ml/python_venv/3.9.13/screamML/bin/activate`? Can this only be done for certain experiments? It makes me uneasy to change the environment Great idea. We should only do...
But is there a reason there are multiple unique large values? Here're the values from a baseline ne30 run, no nudging or mlcorrection: ``` f = xr.open_mfdataset("output.scream.AVERAGE.nhours_x3.2011-*.nc") np.unique(np.sort(f.T_mid_at_850hPa.isel(time=10)))[-10:] array([3.07124725e+02, 3.07140991e+02,...
Oh good point, this is time averaged.
Yes, the non unique large values are due to time averaging.
Unclear if this is related, but there has been a cold temperature [issue](https://github.com/E3SM-Project/scream/issues/2061) with ne120 where we need to limit the temperature (`./atmchange vtheta_thresh=200`).
> I just tried ne120 with scream on pm-gpu and was able to run 5 days. I see on slack, Elynn noted that she sees the error after around 1...
Here's a plot of the difference from the current run that crashed vs. previous run: Large differences observed from the first restart files, which is one month into the simulation.
After reverting PR#2193 from E3SM-Project/oksanaguba/eamxx/wetdry, I was able to re-use the restart file on 06-01 to run one month w/o crash, previously the crash happened on 06-02 19:45.
I think we still have some permission issues that prevent external users to download from the google bucket, we will try and get that sorted out. As a workaround, we...
I opted to remove ML specific settings from `config_machines`, we will include them in our launch script as Noel suggested. A short doc is also added.