graphcast
graphcast copied to clipboard
How to use this code?
I don’t know how to run this code. The README says that graphcast.py can be used to get the results in one step. Is it a prediction result? So what kind of data is input? If you want to predict in real time, you have to Which function is the result of the output data?
After you run all the cells in the graphcast_demo.ipynb, the predictions can be seen in the "Run the model > Autoregressive rollout (loop in Python)" tab.
The output is in the form of an xarray.
You can get custom input data using the CDS API. Just make sure the data ends up in the same format as the one in the demo.
import cdsapi
c = cdsapi.Client()
c.retrieve(
'reanalysis-era5-pressure-levels',
{
'product_type': 'reanalysis',
'format': 'netcdf',
'variable': [
'geopotential', 'specific_humidity', 'temperature',
'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity',
],
'year': '2023',
'month': '11',
'day': '01',
'time': [
'00:00', '06:00',
],
'pressure_level': [
'1', '2', '3',
'5', '7', '10',
'20', '30', '50',
'70', '100', '125',
'150', '175', '200',
'225', '250', '300',
'350', '400', '450',
'500', '550', '600',
'650', '700', '750',
'775', '800', '825',
'850', '875', '900',
'925', '950', '975',
'1000',
],
},
'atmospheric.nc')
c.retrieve(
'reanalysis-era5-single-levels',
{
'product_type': 'reanalysis',
'variable': [
'10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature',
'geopotential', 'land_sea_mask', 'mean_sea_level_pressure',
'toa_incident_solar_radiation', 'total_precipitation',
],
'year': '2023',
'month': '11',
'day': '01',
'time': [
'00:00', '06:00',
],
'format': 'netcdf',
},
'single-level.nc')
The code for fetching custom data for graphcast.
import datetime
import math
import pandas as pd
import xarray
def getTrigVals(dt:datetime.datetime):
DAY_IN_SECONDS = 24 * 60 * 60
YEAR_IN_SECONDS = 365.25 * DAY_IN_SECONDS
year_start = datetime.datetime(year = dt.year, month = 1, day = 1)
seconds_since_year_start = (dt - year_start).total_seconds()
seconds_since_midnight = (dt - dt.replace(hour = 0, minute = 0, second = 0, microsecond = 0)).total_seconds()
year_progress = 2 * math.pi * seconds_since_year_start / YEAR_IN_SECONDS
day_progress = 2 * math.pi * seconds_since_midnight / DAY_IN_SECONDS
year_progress_sin = math.sin(year_progress)
year_progress_cos = math.cos(year_progress)
day_progress_sin = math.sin(day_progress)
day_progress_cos = math.cos(day_progress)
return year_progress_sin, year_progress_cos, day_progress_sin, day_progress_cos
atmos = xarray.open_dataset('atmospheric.nc', engine = 'scipy').to_dataframe()
single = xarray.open_dataset('single-level.nc', engine = 'scipy').to_dataframe()
atmoscols = ['geopotential', 'specific_humidity', 'temperature', 'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity']
atmos = atmos.rename(columns = {col:atmoscols[ind] for ind, col in enumerate(atmos.columns.values.tolist())})
singlecols = ['10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature', 'geopotential_at_surface', 'land_sea_mask', 'mean_sea_level_pressure', 'toa_incident_solar_radiation', 'total_precipitation']
single = single.rename(columns = {col:singlecols[ind] for ind, col in enumerate(single.columns.values.tolist())})
atmos = atmos.reset_index()
single = single.reset_index()
atmos = atmos[atmos['level'].isin([50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000])]
atmos = atmos[atmos['latitude'].isin([i for i in range(-90, 91, 1)])]
atmos = atmos[atmos['longitude'].isin([i for i in range(360)])]
single = single[single['latitude'].isin([i for i in range(-90, 91, 1)])]
single = single[single['longitude'].isin([i for i in range(360)])]
single['year_progress_sin'], single['year_progress_cos'], single['day_progress_sin'], single['day_progress_cos'] = zip(*single['time'].map(lambda x:getTrigVals(x)))
df = pd.merge(atmos, single, how = 'left', on = ['latitude', 'longitude', 'time'])
df = df.rename(columns = {'latitude': 'lat', 'longitude': 'lon'})
df['time'] = df['time'].replace({pd.Timestamp('2023-11-01 00:00:00'): pd.Timedelta('-1 days +18:00:00'), pd.Timestamp('2023-11-01 06:00:00'): pd.Timedelta('00:00:00')})
df['batch'] = 1
df = df.set_index(['lat', 'lon', 'time', 'level', 'batch'])
df['total_precipitation_6hr'] = df['total_precipitation']
df.pop('total_precipitation')
df = df.to_xarray()
df['geopotential_at_surface'] = df['geopotential_at_surface'].isel(batch = 0, time = 0, level = 0)
df['land_sea_mask'] = df['land_sea_mask'].isel(batch = 0, time = 0, level = 0)
df['2m_temperature'] = df['2m_temperature'].isel(level = 0)
df['mean_sea_level_pressure'] = df['mean_sea_level_pressure'].isel(level = 0)
df['10m_v_component_of_wind'] = df['10m_v_component_of_wind'].isel(level = 0)
df['10m_u_component_of_wind'] = df['10m_u_component_of_wind'].isel(level = 0)
df['total_precipitation_6hr'] = df['total_precipitation_6hr'].isel(level = 0)
df['toa_incident_solar_radiation'] = df['toa_incident_solar_radiation'].isel(level = 0)
df['year_progress_sin'] = df['year_progress_sin'].isel(level = 0, lat = 0, lon = 0)
df['year_progress_cos'] = df['year_progress_cos'].isel(level = 0, lat = 0, lon = 0)
df['day_progress_sin'] = df['day_progress_sin'].isel(level = 0, lat = 0)
df['day_progress_cos'] = df['day_progress_cos'].isel(level = 0, lat = 0)
df = df.drop_vars('batch')
Run this code after fetching data from CDS, this will act as the input values for prediction.
predictions = run_forward_jitted(
rng=jax.random.PRNGKey(0),
inputs=df,
targets_template=train_targets * np.nan,
forcings=train_forcings)
import datetime import math import pandas as pd import xarray def getTrigVals(dt:datetime.datetime): DAY_IN_SECONDS = 24 * 60 * 60 YEAR_IN_SECONDS = 365.25 * DAY_IN_SECONDS year_start = datetime.datetime(year = dt.year, month = 1, day = 1) seconds_since_year_start = (dt - year_start).total_seconds() seconds_since_midnight = (dt - dt.replace(hour = 0, minute = 0, second = 0, microsecond = 0)).total_seconds() year_progress = 2 * math.pi * seconds_since_year_start / YEAR_IN_SECONDS day_progress = 2 * math.pi * seconds_since_midnight / DAY_IN_SECONDS year_progress_sin = math.sin(year_progress) year_progress_cos = math.cos(year_progress) day_progress_sin = math.sin(day_progress) day_progress_cos = math.cos(day_progress) return year_progress_sin, year_progress_cos, day_progress_sin, day_progress_cos atmos = xarray.open_dataset('atmospheric.nc', engine = 'scipy').to_dataframe() single = xarray.open_dataset('single-level.nc', engine = 'scipy').to_dataframe() atmoscols = ['geopotential', 'specific_humidity', 'temperature', 'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity'] atmos = atmos.rename(columns = {col:atmoscols[ind] for ind, col in enumerate(atmos.columns.values.tolist())}) singlecols = ['10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature', 'geopotential_at_surface', 'land_sea_mask', 'mean_sea_level_pressure', 'toa_incident_solar_radiation', 'total_precipitation'] single = single.rename(columns = {col:singlecols[ind] for ind, col in enumerate(single.columns.values.tolist())}) atmos = atmos.reset_index() single = single.reset_index() atmos = atmos[atmos['level'].isin([50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000])] atmos = atmos[atmos['latitude'].isin([i for i in range(-90, 91, 1)])] atmos = atmos[atmos['longitude'].isin([i for i in range(360)])] single = single[single['latitude'].isin([i for i in range(-90, 91, 1)])] single = single[single['longitude'].isin([i for i in range(360)])] single['year_progress_sin'], single['year_progress_cos'], single['day_progress_sin'], single['day_progress_cos'] = zip(*single['time'].map(lambda x:getTrigVals(x))) df = pd.merge(atmos, single, how = 'left', on = ['latitude', 'longitude', 'time']) df = df.rename(columns = {'latitude': 'lat', 'longitude': 'lon'}) df['time'] = df['time'].replace({pd.Timestamp('2023-11-01 00:00:00'): pd.Timedelta('-1 days +18:00:00'), pd.Timestamp('2023-11-01 06:00:00'): pd.Timedelta('00:00:00')}) df['batch'] = 1 df = df.set_index(['lat', 'lon', 'time', 'level', 'batch']) df['total_precipitation_6hr'] = df['total_precipitation'] df.pop('total_precipitation') df = df.to_xarray() df['geopotential_at_surface'] = df['geopotential_at_surface'].isel(batch = 0, time = 0, level = 0) df['land_sea_mask'] = df['land_sea_mask'].isel(batch = 0, time = 0, level = 0) df['2m_temperature'] = df['2m_temperature'].isel(level = 0) df['mean_sea_level_pressure'] = df['mean_sea_level_pressure'].isel(level = 0) df['10m_v_component_of_wind'] = df['10m_v_component_of_wind'].isel(level = 0) df['10m_u_component_of_wind'] = df['10m_u_component_of_wind'].isel(level = 0) df['total_precipitation_6hr'] = df['total_precipitation_6hr'].isel(level = 0) df['toa_incident_solar_radiation'] = df['toa_incident_solar_radiation'].isel(level = 0) df['year_progress_sin'] = df['year_progress_sin'].isel(level = 0, lat = 0, lon = 0) df['year_progress_cos'] = df['year_progress_cos'].isel(level = 0, lat = 0, lon = 0) df['day_progress_sin'] = df['day_progress_sin'].isel(level = 0, lat = 0) df['day_progress_cos'] = df['day_progress_cos'].isel(level = 0, lat = 0) df = df.drop_vars('batch')
Run this code after fetching data from CDS, this will act as the input values for prediction.
predictions = run_forward_jitted( rng=jax.random.PRNGKey(0), inputs=df, targets_template=train_targets * np.nan, forcings=train_forcings)
how could i get the "train_targets" and "train_forcings"?
Thanks
import datetime import math import pandas as pd import xarray def getTrigVals(dt:datetime.datetime): DAY_IN_SECONDS = 24 * 60 * 60 YEAR_IN_SECONDS = 365.25 * DAY_IN_SECONDS year_start = datetime.datetime(year = dt.year, month = 1, day = 1) seconds_since_year_start = (dt - year_start).total_seconds() seconds_since_midnight = (dt - dt.replace(hour = 0, minute = 0, second = 0, microsecond = 0)).total_seconds() year_progress = 2 * math.pi * seconds_since_year_start / YEAR_IN_SECONDS day_progress = 2 * math.pi * seconds_since_midnight / DAY_IN_SECONDS year_progress_sin = math.sin(year_progress) year_progress_cos = math.cos(year_progress) day_progress_sin = math.sin(day_progress) day_progress_cos = math.cos(day_progress) return year_progress_sin, year_progress_cos, day_progress_sin, day_progress_cos atmos = xarray.open_dataset('atmospheric.nc', engine = 'scipy').to_dataframe() single = xarray.open_dataset('single-level.nc', engine = 'scipy').to_dataframe() atmoscols = ['geopotential', 'specific_humidity', 'temperature', 'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity'] atmos = atmos.rename(columns = {col:atmoscols[ind] for ind, col in enumerate(atmos.columns.values.tolist())}) singlecols = ['10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature', 'geopotential_at_surface', 'land_sea_mask', 'mean_sea_level_pressure', 'toa_incident_solar_radiation', 'total_precipitation'] single = single.rename(columns = {col:singlecols[ind] for ind, col in enumerate(single.columns.values.tolist())}) atmos = atmos.reset_index() single = single.reset_index() atmos = atmos[atmos['level'].isin([50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000])] atmos = atmos[atmos['latitude'].isin([i for i in range(-90, 91, 1)])] atmos = atmos[atmos['longitude'].isin([i for i in range(360)])] single = single[single['latitude'].isin([i for i in range(-90, 91, 1)])] single = single[single['longitude'].isin([i for i in range(360)])] single['year_progress_sin'], single['year_progress_cos'], single['day_progress_sin'], single['day_progress_cos'] = zip(*single['time'].map(lambda x:getTrigVals(x))) df = pd.merge(atmos, single, how = 'left', on = ['latitude', 'longitude', 'time']) df = df.rename(columns = {'latitude': 'lat', 'longitude': 'lon'}) df['time'] = df['time'].replace({pd.Timestamp('2023-11-01 00:00:00'): pd.Timedelta('-1 days +18:00:00'), pd.Timestamp('2023-11-01 06:00:00'): pd.Timedelta('00:00:00')}) df['batch'] = 1 df = df.set_index(['lat', 'lon', 'time', 'level', 'batch']) df['total_precipitation_6hr'] = df['total_precipitation'] df.pop('total_precipitation') df = df.to_xarray() df['geopotential_at_surface'] = df['geopotential_at_surface'].isel(batch = 0, time = 0, level = 0) df['land_sea_mask'] = df['land_sea_mask'].isel(batch = 0, time = 0, level = 0) df['2m_temperature'] = df['2m_temperature'].isel(level = 0) df['mean_sea_level_pressure'] = df['mean_sea_level_pressure'].isel(level = 0) df['10m_v_component_of_wind'] = df['10m_v_component_of_wind'].isel(level = 0) df['10m_u_component_of_wind'] = df['10m_u_component_of_wind'].isel(level = 0) df['total_precipitation_6hr'] = df['total_precipitation_6hr'].isel(level = 0) df['toa_incident_solar_radiation'] = df['toa_incident_solar_radiation'].isel(level = 0) df['year_progress_sin'] = df['year_progress_sin'].isel(level = 0, lat = 0, lon = 0) df['year_progress_cos'] = df['year_progress_cos'].isel(level = 0, lat = 0, lon = 0) df['day_progress_sin'] = df['day_progress_sin'].isel(level = 0, lat = 0) df['day_progress_cos'] = df['day_progress_cos'].isel(level = 0, lat = 0) df = df.drop_vars('batch')
Run this code after fetching data from CDS, this will act as the input values for prediction.
predictions = run_forward_jitted( rng=jax.random.PRNGKey(0), inputs=df, targets_template=train_targets * np.nan, forcings=train_forcings)
how could i get the "train_targets" and "train_forcings"?
Thanks
You can get both the values from ERA5 dataset. For targets you need to create a dataset of what the output should look like, it is just a empty dataset, which is why the colab demo code has "train_targets * np.nan".
As for forcings, you need the day progress and year progress of all the coordinates in your input. It can be calculated using data_utils from the graphcast library. As for the total incident radiation, you can either use the pysolar library for it or the ERA5 data, if it is available for the dates you want.
As for forcings, you need the day progress and year progress of all the coordinates in your input. It can be calculated using data_utils from the graphcast library. As for the total incident radiation, you can either use the pysolar library for it or the ERA5 data, if it is available for the dates you want.
As someone who is relatively new to xarray could you ELI5 how to create the datetime coordinate and add it to the dataset because I seem to be struggling unnecessarily with it.
Have you solved the problem of running the source code?
@Shadowwomen
As someone who is relatively new to xarray could you ELI5 how to create the datetime coordinate and add it to the dataset because I seem to be struggling unnecessarily with it.
Basically when you want to predict values using graphcast, you need three kinds of data: input, target and forcings. Now if you are calculating predictions for x hours, then the input needs to be for x - 6 and x - 12 hours.
Data for x - 6 hours and x - 12 hours can be found from CDS. You can check out their documentation for fetching data using their APIs, its pretty simple. The values of these two hours will be under the datetime field, something like datetime.datetime(2023, 1, 1, 0, 0, 0) and datetime.datetime(2023, 1, 1, 6, 0, 0), for example. Similarly you will have another datetime object for the target(s).
So after all the data is assembled, input + target, in case you are using a pandas dataframe, you will have to set the index to lat, lon, time, level and batch (you will have to add level manually). However if you are using an xarray, you will have to rename the 'time' field to 'datetime'.
Basically when you want to predict values using graphcast, you need three kinds of data: input, target and forcings. Now if you are calculating predictions for x hours, then the input needs to be for x - 6 and x - 12 hours.
Data for x - 6 hours and x - 12 hours can be found from CDS. You can check out their documentation for fetching data using their APIs, its pretty simple. The values of these two hours will be under the datetime field, something like datetime.datetime(2023, 1, 1, 0, 0, 0) and datetime.datetime(2023, 1, 1, 6, 0, 0), for example. Similarly you will have another datetime object for the target(s).
So after all the data is assembled, input + target, in case you are using a pandas dataframe, you will have to set the index to lat, lon, time, level and batch (you will have to add level manually). However if you are using an xarray, you will have to rename the 'time' field to 'datetime'.
I have the CDS data and running the code that @abhinavyesss kindly provided as is gives me an xarray dataset that looks like this:
The dataset produced from the example data, however, looks like this, with the extra time dimension, the datetime coordinate, and without the four time progress variables.
In my mind if I can get the CDS data into the same format as the example data then I should be able to use that as example_batch, but given @abhinavyesss also rewrote the predictions section suggests I am misunderstanding something. When I use the altered CDS data it tells me it can't find the datetime variable when it attempts to remove it (because it breaks the autoregressive rollouts apparently), which makes sense as there isn't one, so I'm trying to figure out how to add it. I'm guessing the forcing data needs datetime but the input doesn't so maybe I need two datasets, one with datetime and one without? Or maybe if I can add datetime I can use that as example_batch and the source code deals with dropping it for the input data? Many thanks for your help.
I should also note that once I get the CDS data working I intend to use GFS operational data to get it to produce actual forecasts, I'm just using the CDS data to try and understand how everything works.
Basically when you want to predict values using graphcast, you need three kinds of data: input, target and forcings. Now if you are calculating predictions for x hours, then the input needs to be for x - 6 and x - 12 hours. Data for x - 6 hours and x - 12 hours can be found from CDS. You can check out their documentation for fetching data using their APIs, its pretty simple. The values of these two hours will be under the datetime field, something like datetime.datetime(2023, 1, 1, 0, 0, 0) and datetime.datetime(2023, 1, 1, 6, 0, 0), for example. Similarly you will have another datetime object for the target(s). So after all the data is assembled, input + target, in case you are using a pandas dataframe, you will have to set the index to lat, lon, time, level and batch (you will have to add level manually). However if you are using an xarray, you will have to rename the 'time' field to 'datetime'.
I have the CDS data and running the code that @abhinavyesss kindly provided as is gives me an xarray dataset that looks like this:
The dataset produced from the example data, however, looks like this, with the extra time dimension, the datetime coordinate, and without the four time progress variables.
In my mind if I can get the CDS data into the same format as the example data then I should be able to use that as example_batch, but given @abhinavyesss also rewrote the predictions section suggests I am misunderstanding something. When I use the altered CDS data it tells me it can't find the datetime variable when it attempts to remove it (because it breaks the autoregressive rollouts apparently), which makes sense as there isn't one, so I'm trying to figure out how to add it. I'm guessing the forcing data needs datetime but the input doesn't so maybe I need two datasets, one with datetime and one without? Or maybe if I can add datetime I can use that as example_batch and the source code deals with dropping it for the input data? Many thanks for your help.
I should also note that once I get the CDS data working I intend to use GFS operational data to get it to produce actual forecasts, I'm just using the CDS data to try and understand how everything works.
Hello, I would like to know how to output the prediction results after inputting the data using graphcast's source code
import datetime import math import pandas as pd import xarray def getTrigVals(dt:datetime.datetime): DAY_IN_SECONDS = 24 * 60 * 60 YEAR_IN_SECONDS = 365.25 * DAY_IN_SECONDS year_start = datetime.datetime(year = dt.year, month = 1, day = 1) seconds_since_year_start = (dt - year_start).total_seconds() seconds_since_midnight = (dt - dt.replace(hour = 0, minute = 0, second = 0, microsecond = 0)).total_seconds() year_progress = 2 * math.pi * seconds_since_year_start / YEAR_IN_SECONDS day_progress = 2 * math.pi * seconds_since_midnight / DAY_IN_SECONDS year_progress_sin = math.sin(year_progress) year_progress_cos = math.cos(year_progress) day_progress_sin = math.sin(day_progress) day_progress_cos = math.cos(day_progress) return year_progress_sin, year_progress_cos, day_progress_sin, day_progress_cos atmos = xarray.open_dataset('atmospheric.nc', engine = 'scipy').to_dataframe() single = xarray.open_dataset('single-level.nc', engine = 'scipy').to_dataframe() atmoscols = ['geopotential', 'specific_humidity', 'temperature', 'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity'] atmos = atmos.rename(columns = {col:atmoscols[ind] for ind, col in enumerate(atmos.columns.values.tolist())}) singlecols = ['10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature', 'geopotential_at_surface', 'land_sea_mask', 'mean_sea_level_pressure', 'toa_incident_solar_radiation', 'total_precipitation'] single = single.rename(columns = {col:singlecols[ind] for ind, col in enumerate(single.columns.values.tolist())}) atmos = atmos.reset_index() single = single.reset_index() atmos = atmos[atmos['level'].isin([50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000])] atmos = atmos[atmos['latitude'].isin([i for i in range(-90, 91, 1)])] atmos = atmos[atmos['longitude'].isin([i for i in range(360)])] single = single[single['latitude'].isin([i for i in range(-90, 91, 1)])] single = single[single['longitude'].isin([i for i in range(360)])] single['year_progress_sin'], single['year_progress_cos'], single['day_progress_sin'], single['day_progress_cos'] = zip(*single['time'].map(lambda x:getTrigVals(x))) df = pd.merge(atmos, single, how = 'left', on = ['latitude', 'longitude', 'time']) df = df.rename(columns = {'latitude': 'lat', 'longitude': 'lon'}) df['time'] = df['time'].replace({pd.Timestamp('2023-11-01 00:00:00'): pd.Timedelta('-1 days +18:00:00'), pd.Timestamp('2023-11-01 06:00:00'): pd.Timedelta('00:00:00')}) df['batch'] = 1 df = df.set_index(['lat', 'lon', 'time', 'level', 'batch']) df['total_precipitation_6hr'] = df['total_precipitation'] df.pop('total_precipitation') df = df.to_xarray() df['geopotential_at_surface'] = df['geopotential_at_surface'].isel(batch = 0, time = 0, level = 0) df['land_sea_mask'] = df['land_sea_mask'].isel(batch = 0, time = 0, level = 0) df['2m_temperature'] = df['2m_temperature'].isel(level = 0) df['mean_sea_level_pressure'] = df['mean_sea_level_pressure'].isel(level = 0) df['10m_v_component_of_wind'] = df['10m_v_component_of_wind'].isel(level = 0) df['10m_u_component_of_wind'] = df['10m_u_component_of_wind'].isel(level = 0) df['total_precipitation_6hr'] = df['total_precipitation_6hr'].isel(level = 0) df['toa_incident_solar_radiation'] = df['toa_incident_solar_radiation'].isel(level = 0) df['year_progress_sin'] = df['year_progress_sin'].isel(level = 0, lat = 0, lon = 0) df['year_progress_cos'] = df['year_progress_cos'].isel(level = 0, lat = 0, lon = 0) df['day_progress_sin'] = df['day_progress_sin'].isel(level = 0, lat = 0) df['day_progress_cos'] = df['day_progress_cos'].isel(level = 0, lat = 0) df = df.drop_vars('batch')
Run this code after fetching data from CDS, this will act as the input values for prediction.
predictions = run_forward_jitted( rng=jax.random.PRNGKey(0), inputs=df, targets_template=train_targets * np.nan, forcings=train_forcings)
how could i get the "train_targets" and "train_forcings"? Thanks
You can get both the values from ERA5 dataset. For targets you need to create a dataset of what the output should look like, it is just a empty dataset, which is why the colab demo code has "train_targets * np.nan".
As for forcings, you need the day progress and year progress of all the coordinates in your input. It can be calculated using data_utils from the graphcast library. As for the total incident radiation, you can either use the pysolar library for it or the ERA5 data, if it is available for the dates you want.
After inputting the data using the above method, how to output the prediction results using graphcast's source code
import datetime import math import pandas as pd import xarray def getTrigVals(dt:datetime.datetime): DAY_IN_SECONDS = 24 * 60 * 60 YEAR_IN_SECONDS = 365.25 * DAY_IN_SECONDS year_start = datetime.datetime(year = dt.year, month = 1, day = 1) seconds_since_year_start = (dt - year_start).total_seconds() seconds_since_midnight = (dt - dt.replace(hour = 0, minute = 0, second = 0, microsecond = 0)).total_seconds() year_progress = 2 * math.pi * seconds_since_year_start / YEAR_IN_SECONDS day_progress = 2 * math.pi * seconds_since_midnight / DAY_IN_SECONDS year_progress_sin = math.sin(year_progress) year_progress_cos = math.cos(year_progress) day_progress_sin = math.sin(day_progress) day_progress_cos = math.cos(day_progress) return year_progress_sin, year_progress_cos, day_progress_sin, day_progress_cos atmos = xarray.open_dataset('atmospheric.nc', engine = 'scipy').to_dataframe() single = xarray.open_dataset('single-level.nc', engine = 'scipy').to_dataframe() atmoscols = ['geopotential', 'specific_humidity', 'temperature', 'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity'] atmos = atmos.rename(columns = {col:atmoscols[ind] for ind, col in enumerate(atmos.columns.values.tolist())}) singlecols = ['10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature', 'geopotential_at_surface', 'land_sea_mask', 'mean_sea_level_pressure', 'toa_incident_solar_radiation', 'total_precipitation'] single = single.rename(columns = {col:singlecols[ind] for ind, col in enumerate(single.columns.values.tolist())}) atmos = atmos.reset_index() single = single.reset_index() atmos = atmos[atmos['level'].isin([50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000])] atmos = atmos[atmos['latitude'].isin([i for i in range(-90, 91, 1)])] atmos = atmos[atmos['longitude'].isin([i for i in range(360)])] single = single[single['latitude'].isin([i for i in range(-90, 91, 1)])] single = single[single['longitude'].isin([i for i in range(360)])] single['year_progress_sin'], single['year_progress_cos'], single['day_progress_sin'], single['day_progress_cos'] = zip(*single['time'].map(lambda x:getTrigVals(x))) df = pd.merge(atmos, single, how = 'left', on = ['latitude', 'longitude', 'time']) df = df.rename(columns = {'latitude': 'lat', 'longitude': 'lon'}) df['time'] = df['time'].replace({pd.Timestamp('2023-11-01 00:00:00'): pd.Timedelta('-1 days +18:00:00'), pd.Timestamp('2023-11-01 06:00:00'): pd.Timedelta('00:00:00')}) df['batch'] = 1 df = df.set_index(['lat', 'lon', 'time', 'level', 'batch']) df['total_precipitation_6hr'] = df['total_precipitation'] df.pop('total_precipitation') df = df.to_xarray() df['geopotential_at_surface'] = df['geopotential_at_surface'].isel(batch = 0, time = 0, level = 0) df['land_sea_mask'] = df['land_sea_mask'].isel(batch = 0, time = 0, level = 0) df['2m_temperature'] = df['2m_temperature'].isel(level = 0) df['mean_sea_level_pressure'] = df['mean_sea_level_pressure'].isel(level = 0) df['10m_v_component_of_wind'] = df['10m_v_component_of_wind'].isel(level = 0) df['10m_u_component_of_wind'] = df['10m_u_component_of_wind'].isel(level = 0) df['total_precipitation_6hr'] = df['total_precipitation_6hr'].isel(level = 0) df['toa_incident_solar_radiation'] = df['toa_incident_solar_radiation'].isel(level = 0) df['year_progress_sin'] = df['year_progress_sin'].isel(level = 0, lat = 0, lon = 0) df['year_progress_cos'] = df['year_progress_cos'].isel(level = 0, lat = 0, lon = 0) df['day_progress_sin'] = df['day_progress_sin'].isel(level = 0, lat = 0) df['day_progress_cos'] = df['day_progress_cos'].isel(level = 0, lat = 0) df = df.drop_vars('batch')
Run this code after fetching data from CDS, this will act as the input values for prediction.
predictions = run_forward_jitted( rng=jax.random.PRNGKey(0), inputs=df, targets_template=train_targets * np.nan, forcings=train_forcings)
how could i get the "train_targets" and "train_forcings"? Thanks
You can get both the values from ERA5 dataset. For targets you need to create a dataset of what the output should look like, it is just a empty dataset, which is why the colab demo code has "train_targets * np.nan".
As for forcings, you need the day progress and year progress of all the coordinates in your input. It can be calculated using data_utils from the graphcast library. As for the total incident radiation, you can either use the pysolar library for it or the ERA5 data, if it is available for the dates you want.
Hi, Thanks for sharing the code. I am trying this code. I am getting below error and unable to resolve it. /usr/local/lib/python3.10/dist-packages/jax/_src/lax/lax.py in _concatenate_shape_rule(*operands, **kwargs) 3166 "dimension {} for shapes {}.") 3167 shapes = [operand.shape for operand in operands] -> 3168 raise TypeError(msg.format(dimension, ", ".join(map(str, shapes)))) 3169 3170 concat_size = sum(o.shape[dimension] for o in operands)
TypeError: Cannot concatenate arrays with shapes that differ in dimensions other than the one being concatenated: concatenating along dimension 2 for shapes (1038240, 1, 183), (65160, 1, 3).
What should I do?
After inputting the data using the above method, how to output the prediction results using graphcast's source code
If your input is correctly formatted, then the predictions will be stored in the 'predictions' variable and can be output by accessing it.
predictions = run_forward_jitted(
rng=jax.random.PRNGKey(0),
inputs=df,
targets_template=train_targets * np.nan,
forcings=train_forcings)
I have the CDS data and running the code that @abhinavyesss kindly provided as is gives me an xarray dataset that looks like this:
The dataset produced from the example data, however, looks like this, with the extra time dimension, the datetime coordinate, and without the four time progress variables.
In my mind if I can get the CDS data into the same format as the example data then I should be able to use that as example_batch, but given @abhinavyesss also rewrote the predictions section suggests I am misunderstanding something. When I use the altered CDS data it tells me it can't find the datetime variable when it attempts to remove it (because it breaks the autoregressive rollouts apparently), which makes sense as there isn't one, so I'm trying to figure out how to add it. I'm guessing the forcing data needs datetime but the input doesn't so maybe I need two datasets, one with datetime and one without? Or maybe if I can add datetime I can use that as example_batch and the source code deals with dropping it for the input data? Many thanks for your help.
I should also note that once I get the CDS data working I intend to use GFS operational data to get it to produce actual forecasts, I'm just using the CDS data to try and understand how everything works.
The example_batch in the graphcast demo contains 3 datetime variables, two inputs and one for prediction. This example batch is then passed to the 'data_utils.extract_inputs_targets_forcings', a graphcast function, in order to get the three sets of data. In my example, I have skipped that function because in nearly every real-time use case, you won't have the prediction data, since it has not technically occurred.
I fetch the data from CDS using its APIs, create the target data manually and I get the forcing data from pysolar (total solar radiation) and graphcast libraries (for day and year progress).
The data I created can then be used directly when fetching predictions, hence the datetime variable is not required. Also, from my implementation I have noticed the 'time' coordinate can be a datetime.datetime object too, a timedelta object is not necessary.
After inputting the data using the above method, how to output the prediction results using graphcast's source code
If your input is correctly formatted, then the predictions will be stored in the 'predictions' variable and can be output by accessing it.
predictions = run_forward_jitted( rng=jax.random.PRNGKey(0), inputs=df, targets_template=train_targets * np.nan, forcings=train_forcings)
Thanks for your reply! I wonder where should I put the following codes to get the result.
>
> ```
> predictions = run_forward_jitted(
> rng=jax.random.PRNGKey(0),
> inputs=df,
> targets_template=train_targets * np.nan,
> forcings=train_forcings)
> ```
I have the CDS data and running the code that @abhinavyesss kindly provided as is gives me an xarray dataset that looks like this:
The dataset produced from the example data, however, looks like this, with the extra time dimension, the datetime coordinate, and without the four time progress variables.
In my mind if I can get the CDS data into the same format as the example data then I should be able to use that as example_batch, but given @abhinavyesss also rewrote the predictions section suggests I am misunderstanding something. When I use the altered CDS data it tells me it can't find the datetime variable when it attempts to remove it (because it breaks the autoregressive rollouts apparently), which makes sense as there isn't one, so I'm trying to figure out how to add it. I'm guessing the forcing data needs datetime but the input doesn't so maybe I need two datasets, one with datetime and one without? Or maybe if I can add datetime I can use that as example_batch and the source code deals with dropping it for the input data? Many thanks for your help. I should also note that once I get the CDS data working I intend to use GFS operational data to get it to produce actual forecasts, I'm just using the CDS data to try and understand how everything works.
The example_batch in the graphcast demo contains 3 datetime variables, two inputs and one for prediction. This example batch is then passed to the 'data_utils.extract_inputs_targets_forcings', a graphcast function, in order to get the three sets of data. In my example, I have skipped that function because in nearly every real-time use case, you won't have the prediction data, since it has not technically occurred.
I fetch the data from CDS using its APIs, create the target data manually and I get the forcing data from pysolar (total solar radiation) and graphcast libraries (for day and year progress).
The data I created can then be used directly when fetching predictions, hence the datetime variable is not required. Also, from my implementation I have noticed the 'time' coordinate can be a datetime.datetime object too, a timedelta object is not necessary.
Hello,I wonder how to create the target data manually and get the forcing data, could you please show me your domeo code, that will be a great help to me!!
After inputting the data using the above method, how to output the prediction results using graphcast's source code
If your input is correctly formatted, then the predictions will be stored in the 'predictions' variable and can be output by accessing it.
predictions = run_forward_jitted( rng=jax.random.PRNGKey(0), inputs=df, targets_template=train_targets * np.nan, forcings=train_forcings)
Thanks for your reply! I wonder where should I put the following codes to get the result.
> > ``` > predictions = run_forward_jitted( > rng=jax.random.PRNGKey(0), > inputs=df, > targets_template=train_targets * np.nan, > forcings=train_forcings) > ```
This code is placed after the input, target and forcing data is fetched.
Hello,I wonder how to create the target data manually and get the forcing data, could you please show me your domeo code, that will be a great help to me!!
@classmethod
def formatData(cls, data:uti.dataframe, includeLevel = False) -> uti.dataframe:
data['batch'] = 0
data = data.rename(columns = {'latitude': 'lat, 'longitude': 'lon'})
data = data.set_index(['lat', 'lon', 'time', 'batch', 'level'])
return data
@classmethod
def getTargets(cls, dt, data:uti.dataframe):
dt, targetdf = uti.toDatetime(dt), uti.dataframe()
data = data.reset_index()
for days in range(6): # Predicting only for 6 dates.
templatedf:uti.dataframe = data[data['time'] == data['time'].tolist()[0]]
templatedf.loc[:, 'time'] = uti.deltaTime(dt, hours = days * 6)
targetdf = pd.concat([targetdf, templatedf], ignore_index = True)
targetdf = cls.formatData(targetdf, True)[Fields().getPredictionFields()]
targetdf = targetdf * uti.nan()
return targetdf
What I do is, take a subset of the input data i.e., one of the two unique 'time' values. This contains all the fields that will be required for prediction. Then I remove the fields that will not be used for prediction.
For forcing data, you have to have to calculate the solar radiation (use pysolar package) and the day and year progress values for every coordinate and every datetime you want to predict for. Forcing data is not dependent on pressure-level, so you will have to drop that variable.
from graphcast import data_utils as du
@classmethod
def addYearProgress(cls, secs, data):
progress = du.get_year_progress(secs)
data[Fields.EngineeredField.YEAR_PROGRESS_SIN.value] = muti.sin(progress)
data[Fields.EngineeredField.YEAR_PROGRESS_COS.value] = muti.cos(progress)
return data
@classmethod
def addDayProgress(cls, secs, lon:str, data:uti.dataframe):
progress:np.ndarray = du.get_day_progress(secs, uti.getNumpyArray(data[lon].unique()))
prxlon = {lon:prog for lon, prog in list(zip(list(data[lon].unique()), progress.tolist()))}
data[Fields.EngineeredField.DAY_PROGRESS_SIN.value] = data[lon].map(lambda x:muti.sin(prxlon[x]))
data[Fields.EngineeredField.DAY_PROGRESS_COS.value] = data[lon].map(lambda x:muti.cos(prxlon[x]))
return data
@classmethod
def getForcings(cls, data:uti.dataframe):
forcingdf = data.reset_index().drop(labels = Fields().getPredictionFields(), axis = 1)
forcingdf = forcingdf.drop(labels = ['level'], axis = 1).drop_duplicates(subset = ['lon', 'lat', 'time'], keep = 'first')
forcingdf = cls.integrateProgress(forcingdf)
forcingdf = cls.integrateSolarRadiation(forcingdf) # Use pysolar.
forcingdf = cls.formatData(forcingdf)
return forcingdf
Hello,I wonder how to create the target data manually and get the forcing data, could you please show me your domeo code, that will be a great help to me!!
@classmethod def formatData(cls, data:uti.dataframe, includeLevel = False) -> uti.dataframe: data['batch'] = 0 data = data.rename(columns = {'latitude': 'lat, 'longitude': 'lon'}) data = data.set_index(['lat', 'lon', 'time', 'batch', 'level']) return data @classmethod def getTargets(cls, dt, data:uti.dataframe): dt, targetdf = uti.toDatetime(dt), uti.dataframe() data = data.reset_index() for days in range(6): # Predicting only for 6 dates. templatedf:uti.dataframe = data[data['time'] == data['time'].tolist()[0]] templatedf.loc[:, 'time'] = uti.deltaTime(dt, hours = days * 6) targetdf = pd.concat([targetdf, templatedf], ignore_index = True) targetdf = cls.formatData(targetdf, True)[Fields().getPredictionFields()] targetdf = targetdf * uti.nan() return targetdf
What I do is, take a subset of the input data i.e., one of the two unique 'time' values. This contains all the fields that will be required for prediction. Then I remove the fields that will not be used for prediction.
For forcing data, you have to have to calculate the solar radiation (use pysolar package) and the day and year progress values for every coordinate and every datetime you want to predict for. Forcing data is not dependent on pressure-level, so you will have to drop that variable.
from graphcast import data_utils as du @classmethod def addYearProgress(cls, secs, data): progress = du.get_year_progress(secs) data[Fields.EngineeredField.YEAR_PROGRESS_SIN.value] = muti.sin(progress) data[Fields.EngineeredField.YEAR_PROGRESS_COS.value] = muti.cos(progress) return data @classmethod def addDayProgress(cls, secs, lon:str, data:uti.dataframe): progress:np.ndarray = du.get_day_progress(secs, uti.getNumpyArray(data[lon].unique())) prxlon = {lon:prog for lon, prog in list(zip(list(data[lon].unique()), progress.tolist()))} data[Fields.EngineeredField.DAY_PROGRESS_SIN.value] = data[lon].map(lambda x:muti.sin(prxlon[x])) data[Fields.EngineeredField.DAY_PROGRESS_COS.value] = data[lon].map(lambda x:muti.cos(prxlon[x])) return data @classmethod def getForcings(cls, data:uti.dataframe): forcingdf = data.reset_index().drop(labels = Fields().getPredictionFields(), axis = 1) forcingdf = forcingdf.drop(labels = ['level'], axis = 1).drop_duplicates(subset = ['lon', 'lat', 'time'], keep = 'first') forcingdf = cls.integrateProgress(forcingdf) forcingdf = cls.integrateSolarRadiation(forcingdf) # Use pysolar. forcingdf = cls.formatData(forcingdf) return forcingdf
Thank you very much for your help, I will try the code you provided
Hello,I wonder how to create the target data manually and get the forcing data, could you please show me your domeo code, that will be a great help to me!!
@classmethod def formatData(cls, data:uti.dataframe, includeLevel = False) -> uti.dataframe: data['batch'] = 0 data = data.rename(columns = {'latitude': 'lat, 'longitude': 'lon'}) data = data.set_index(['lat', 'lon', 'time', 'batch', 'level']) return data @classmethod def getTargets(cls, dt, data:uti.dataframe): dt, targetdf = uti.toDatetime(dt), uti.dataframe() data = data.reset_index() for days in range(6): # Predicting only for 6 dates. templatedf:uti.dataframe = data[data['time'] == data['time'].tolist()[0]] templatedf.loc[:, 'time'] = uti.deltaTime(dt, hours = days * 6) targetdf = pd.concat([targetdf, templatedf], ignore_index = True) targetdf = cls.formatData(targetdf, True)[Fields().getPredictionFields()] targetdf = targetdf * uti.nan() return targetdf
What I do is, take a subset of the input data i.e., one of the two unique 'time' values. This contains all the fields that will be required for prediction. Then I remove the fields that will not be used for prediction.
For forcing data, you have to have to calculate the solar radiation (use pysolar package) and the day and year progress values for every coordinate and every datetime you want to predict for. Forcing data is not dependent on pressure-level, so you will have to drop that variable.
from graphcast import data_utils as du @classmethod def addYearProgress(cls, secs, data): progress = du.get_year_progress(secs) data[Fields.EngineeredField.YEAR_PROGRESS_SIN.value] = muti.sin(progress) data[Fields.EngineeredField.YEAR_PROGRESS_COS.value] = muti.cos(progress) return data @classmethod def addDayProgress(cls, secs, lon:str, data:uti.dataframe): progress:np.ndarray = du.get_day_progress(secs, uti.getNumpyArray(data[lon].unique())) prxlon = {lon:prog for lon, prog in list(zip(list(data[lon].unique()), progress.tolist()))} data[Fields.EngineeredField.DAY_PROGRESS_SIN.value] = data[lon].map(lambda x:muti.sin(prxlon[x])) data[Fields.EngineeredField.DAY_PROGRESS_COS.value] = data[lon].map(lambda x:muti.cos(prxlon[x])) return data @classmethod def getForcings(cls, data:uti.dataframe): forcingdf = data.reset_index().drop(labels = Fields().getPredictionFields(), axis = 1) forcingdf = forcingdf.drop(labels = ['level'], axis = 1).drop_duplicates(subset = ['lon', 'lat', 'time'], keep = 'first') forcingdf = cls.integrateProgress(forcingdf) forcingdf = cls.integrateSolarRadiation(forcingdf) # Use pysolar. forcingdf = cls.formatData(forcingdf) return forcingdf
Hello, so the process of running graphcast locally is to get the data of t-6 and t-12, then create target and forcing data, and then call graphcast for output ? However, there are still many details to be dealt with in this process. As a beginner, it is difficult. I know that you have a good understanding of graphcast, and I hope you can give a code that can output the prediction results in a complete operation. Or you can record a video to upload youtube, parse the source code of graphcast, and thank you again for your work. Your work is of great significance to many beginners.
Hello, so the process of running graphcast locally is to get the data of t-6 and t-12, then create target and forcing data, and then call graphcast for output ? However, there are still many details to be dealt with in this process. As a beginner, it is difficult. I know that you have a good understanding of graphcast, and I hope you can give a code that can output the prediction results in a complete operation. Or you can record a video to upload youtube, parse the source code of graphcast, and thank you again for your work. Your work is of great significance to many beginners.
Hi, I am sorry about not being able to provide the most clearest answers to your queries. I do plan to write an article about my work on Graphcast, however that is still some time away. I am close to completing my experimentation with graphcast in a few days, post which I will upload the code, hopefully that will provide the answers to everything from data collection for inputs to getting predictions.
Hello, so the process of running graphcast locally is to get the data of t-6 and t-12, then create target and forcing data, and then call graphcast for output ? However, there are still many details to be dealt with in this process. As a beginner, it is difficult. I know that you have a good understanding of graphcast, and I hope you can give a code that can output the prediction results in a complete operation. Or you can record a video to upload youtube, parse the source code of graphcast, and thank you again for your work. Your work is of great significance to many beginners.
Hi, I am sorry about not being able to provide the most clearest answers to your queries. I do plan to write an article about my work on Graphcast, however that is still some time away. I am close to completing my experimentation with graphcast in a few days, post which I will upload the code, hopefully that will provide the answers to everything from data collection for inputs to getting predictions.
I am looking forward to your furture work,It will be a great help to me ,thanks a lot!!
import cdsapi c = cdsapi.Client() c.retrieve( 'reanalysis-era5-pressure-levels', { 'product_type': 'reanalysis', 'format': 'netcdf', 'variable': [ 'geopotential', 'specific_humidity', 'temperature', 'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity', ], 'year': '2023', 'month': '11', 'day': '01', 'time': [ '00:00', '06:00', ], 'pressure_level': [ '1', '2', '3', '5', '7', '10', '20', '30', '50', '70', '100', '125', '150', '175', '200', '225', '250', '300', '350', '400', '450', '500', '550', '600', '650', '700', '750', '775', '800', '825', '850', '875', '900', '925', '950', '975', '1000', ], }, 'atmospheric.nc') c.retrieve( 'reanalysis-era5-single-levels', { 'product_type': 'reanalysis', 'variable': [ '10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature', 'geopotential', 'land_sea_mask', 'mean_sea_level_pressure', 'toa_incident_solar_radiation', 'total_precipitation', ], 'year': '2023', 'month': '11', 'day': '01', 'time': [ '00:00', '06:00', ], 'format': 'netcdf', }, 'single-level.nc')
The code for fetching custom data for graphcast.
Dear teacher, I downloaded the nc data I needed according to your code. The data I downloaded is the temperature data of March 5, 2020, with pressure levels ranging from 10 to 1000. Then I upload the data to Google Cloud disk, may I ask you how I can use my data as input data in the demo code of Google colab to get the prediction result? Thank you very much.@abhinavyesss
Dear teacher, hello. I noticed that there are two nc files in your code. single-level.nc is the data you are interested in obtained through CDS. May I ask what is the atmospheric.nc?How to get it? @abhinavyesss
import datetime import math import pandas as pd import xarray def getTrigVals(dt:datetime.datetime): DAY_IN_SECONDS = 24 * 60 * 60 YEAR_IN_SECONDS = 365.25 * DAY_IN_SECONDS year_start = datetime.datetime(year = dt.year, month = 1, day = 1) seconds_since_year_start = (dt - year_start).total_seconds() seconds_since_midnight = (dt - dt.replace(hour = 0, minute = 0, second = 0, microsecond = 0)).total_seconds() year_progress = 2 * math.pi * seconds_since_year_start / YEAR_IN_SECONDS day_progress = 2 * math.pi * seconds_since_midnight / DAY_IN_SECONDS year_progress_sin = math.sin(year_progress) year_progress_cos = math.cos(year_progress) day_progress_sin = math.sin(day_progress) day_progress_cos = math.cos(day_progress) return year_progress_sin, year_progress_cos, day_progress_sin, day_progress_cos atmos = xarray.open_dataset('atmospheric.nc', engine = 'scipy').to_dataframe() single = xarray.open_dataset('single-level.nc', engine = 'scipy').to_dataframe() atmoscols = ['geopotential', 'specific_humidity', 'temperature', 'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity'] atmos = atmos.rename(columns = {col:atmoscols[ind] for ind, col in enumerate(atmos.columns.values.tolist())}) singlecols = ['10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature', 'geopotential_at_surface', 'land_sea_mask', 'mean_sea_level_pressure', 'toa_incident_solar_radiation', 'total_precipitation'] single = single.rename(columns = {col:singlecols[ind] for ind, col in enumerate(single.columns.values.tolist())}) atmos = atmos.reset_index() single = single.reset_index() atmos = atmos[atmos['level'].isin([50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000])] atmos = atmos[atmos['latitude'].isin([i for i in range(-90, 91, 1)])] atmos = atmos[atmos['longitude'].isin([i for i in range(360)])] single = single[single['latitude'].isin([i for i in range(-90, 91, 1)])] single = single[single['longitude'].isin([i for i in range(360)])] single['year_progress_sin'], single['year_progress_cos'], single['day_progress_sin'], single['day_progress_cos'] = zip(*single['time'].map(lambda x:getTrigVals(x))) df = pd.merge(atmos, single, how = 'left', on = ['latitude', 'longitude', 'time']) df = df.rename(columns = {'latitude': 'lat', 'longitude': 'lon'}) df['time'] = df['time'].replace({pd.Timestamp('2023-11-01 00:00:00'): pd.Timedelta('-1 days +18:00:00'), pd.Timestamp('2023-11-01 06:00:00'): pd.Timedelta('00:00:00')}) df['batch'] = 1 df = df.set_index(['lat', 'lon', 'time', 'level', 'batch']) df['total_precipitation_6hr'] = df['total_precipitation'] df.pop('total_precipitation') df = df.to_xarray() df['geopotential_at_surface'] = df['geopotential_at_surface'].isel(batch = 0, time = 0, level = 0) df['land_sea_mask'] = df['land_sea_mask'].isel(batch = 0, time = 0, level = 0) df['2m_temperature'] = df['2m_temperature'].isel(level = 0) df['mean_sea_level_pressure'] = df['mean_sea_level_pressure'].isel(level = 0) df['10m_v_component_of_wind'] = df['10m_v_component_of_wind'].isel(level = 0) df['10m_u_component_of_wind'] = df['10m_u_component_of_wind'].isel(level = 0) df['total_precipitation_6hr'] = df['total_precipitation_6hr'].isel(level = 0) df['toa_incident_solar_radiation'] = df['toa_incident_solar_radiation'].isel(level = 0) df['year_progress_sin'] = df['year_progress_sin'].isel(level = 0, lat = 0, lon = 0) df['year_progress_cos'] = df['year_progress_cos'].isel(level = 0, lat = 0, lon = 0) df['day_progress_sin'] = df['day_progress_sin'].isel(level = 0, lat = 0) df['day_progress_cos'] = df['day_progress_cos'].isel(level = 0, lat = 0) df = df.drop_vars('batch')
Run this code after fetching data from CDS, this will act as the input values for prediction.
predictions = run_forward_jitted( rng=jax.random.PRNGKey(0), inputs=df, targets_template=train_targets * np.nan, forcings=train_forcings)
how could i get the "train_targets" and "train_forcings"?
Thanks Dear teacher, hello. Do you have the data you are interested in through CDS and then complete the forecast? How do I enter the data after I download it? Thank @oubahe
Dear teacher, hello. I noticed that there are two nc files in your code. single-level.nc is the data you are interested in obtained through CDS. May I ask what is the atmospheric.nc?How to get it? @abhinavyesss
The input data for graphcast requires a lot of fields, some of those fields are independent of the pressure level, this data is obtained from the single level collection from CDS and saved under single-level.nc.
Some other fields are dependant on the pressure level, this data is fetched from a different collection of CDS, 'ERA5 hourly data on pressure levels from 1940 to present'. I stored this data in atmospheric.nc.
The code for obtaining data from both collections is quoted by you.
Dear teacher, I downloaded the nc data I needed according to your code. The data I downloaded is the temperature data of March 5, 2020, with pressure levels ranging from 10 to 1000. Then I upload the data to Google Cloud disk, may I ask you how I can use my data as input data in the demo code of Google colab to get the prediction result? Thank you very much.@abhinavyesss
You have quoted the solution to this in your most recent comment.
However this is just the input data. In order to make the predictions you will also have to get the forcings and target data. For every hour you will be making the prediction for, you need to have the forcings and target values.
Dear teacher, I downloaded the nc data I needed according to your code. The data I downloaded is the temperature data of March 5, 2020, with pressure levels ranging from 10 to 1000. Then I upload the data to Google Cloud disk, may I ask you how I can use my data as input data in the demo code of Google colab to get the prediction result? Thank you very much.@abhinavyesss
You have quoted the solution to this in your most recent comment.
However this is just the input data. In order to make the predictions you will also have to get the forcings and target data. For every hour you will be making the prediction for, you need to have the forcings and target values.
Hi, I can understand that forcing values are needed for making predictions. But what is the purpose of target values? are they needed for accuracy evaluation only or make predictions itself. Thanks in advance. @abhinavyesss