graphcast How to use this code?

I don’t know how to run this code. The README says that graphcast.py can be used to get the results in one step. Is it a prediction result? So what kind of data is input? If you want to predict in real time, you have to Which function is the result of the output data?

Nov 30 '23 06:11 Shadowwomen

After you run all the cells in the graphcast_demo.ipynb, the predictions can be seen in the "Run the model > Autoregressive rollout (loop in Python)" tab.

The output is in the form of an xarray.

Dec 01 '23 01:12 abhinavyesss

You can get custom input data using the CDS API. Just make sure the data ends up in the same format as the one in the demo.

Dec 01 '23 16:12 abhinavyesss

import cdsapi

c = cdsapi.Client()

c.retrieve(
    'reanalysis-era5-pressure-levels',
    {
        'product_type': 'reanalysis',
        'format': 'netcdf',
        'variable': [
            'geopotential', 'specific_humidity', 'temperature',
            'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity',
        ],
        'year': '2023',
        'month': '11',
        'day': '01',
        'time': [
            '00:00', '06:00',
        ],
        'pressure_level': [
            '1', '2', '3',
            '5', '7', '10',
            '20', '30', '50',
            '70', '100', '125',
            '150', '175', '200',
            '225', '250', '300',
            '350', '400', '450',
            '500', '550', '600',
            '650', '700', '750',
            '775', '800', '825',
            '850', '875', '900',
            '925', '950', '975',
            '1000',
        ],
    },
    'atmospheric.nc')

c.retrieve(
    'reanalysis-era5-single-levels',
    {
        'product_type': 'reanalysis',
        'variable': [
            '10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature',
            'geopotential', 'land_sea_mask', 'mean_sea_level_pressure',
            'toa_incident_solar_radiation', 'total_precipitation',
        ],
        'year': '2023',
        'month': '11',
        'day': '01',
        'time': [
            '00:00', '06:00',
        ],
        'format': 'netcdf',
    },
    'single-level.nc')

The code for fetching custom data for graphcast.

Dec 04 '23 12:12 abhinavyesss

import datetime
import math
import pandas as pd
import xarray

def getTrigVals(dt:datetime.datetime):

    DAY_IN_SECONDS = 24 * 60 * 60
    YEAR_IN_SECONDS = 365.25 * DAY_IN_SECONDS

    year_start = datetime.datetime(year = dt.year, month = 1, day = 1)
    seconds_since_year_start = (dt - year_start).total_seconds()
    seconds_since_midnight = (dt - dt.replace(hour = 0, minute = 0, second = 0, microsecond = 0)).total_seconds()

    year_progress = 2 * math.pi * seconds_since_year_start / YEAR_IN_SECONDS
    day_progress = 2 * math.pi * seconds_since_midnight / DAY_IN_SECONDS

    year_progress_sin = math.sin(year_progress)
    year_progress_cos = math.cos(year_progress)
    day_progress_sin = math.sin(day_progress)
    day_progress_cos = math.cos(day_progress)

    return year_progress_sin, year_progress_cos, day_progress_sin, day_progress_cos

atmos = xarray.open_dataset('atmospheric.nc', engine = 'scipy').to_dataframe()
single = xarray.open_dataset('single-level.nc', engine = 'scipy').to_dataframe()

atmoscols = ['geopotential', 'specific_humidity', 'temperature', 'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity']
atmos = atmos.rename(columns = {col:atmoscols[ind] for ind, col in enumerate(atmos.columns.values.tolist())})
singlecols = ['10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature', 'geopotential_at_surface', 'land_sea_mask', 'mean_sea_level_pressure', 'toa_incident_solar_radiation', 'total_precipitation']
single = single.rename(columns = {col:singlecols[ind] for ind, col in enumerate(single.columns.values.tolist())})

atmos = atmos.reset_index()
single = single.reset_index()

atmos = atmos[atmos['level'].isin([50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000])]
atmos = atmos[atmos['latitude'].isin([i for i in range(-90, 91, 1)])]
atmos = atmos[atmos['longitude'].isin([i for i in range(360)])]

single = single[single['latitude'].isin([i for i in range(-90, 91, 1)])]
single = single[single['longitude'].isin([i for i in range(360)])]
single['year_progress_sin'], single['year_progress_cos'], single['day_progress_sin'], single['day_progress_cos'] = zip(*single['time'].map(lambda x:getTrigVals(x)))

df = pd.merge(atmos, single, how = 'left', on = ['latitude', 'longitude', 'time'])
df = df.rename(columns = {'latitude': 'lat', 'longitude': 'lon'})
df['time'] = df['time'].replace({pd.Timestamp('2023-11-01 00:00:00'): pd.Timedelta('-1 days +18:00:00'), pd.Timestamp('2023-11-01 06:00:00'): pd.Timedelta('00:00:00')})
df['batch'] = 1
df = df.set_index(['lat', 'lon', 'time', 'level', 'batch'])
df['total_precipitation_6hr'] = df['total_precipitation']
df.pop('total_precipitation')

df = df.to_xarray()
df['geopotential_at_surface'] = df['geopotential_at_surface'].isel(batch = 0, time = 0, level = 0)
df['land_sea_mask'] = df['land_sea_mask'].isel(batch = 0, time = 0, level = 0)
df['2m_temperature'] = df['2m_temperature'].isel(level = 0)
df['mean_sea_level_pressure'] = df['mean_sea_level_pressure'].isel(level = 0)
df['10m_v_component_of_wind'] = df['10m_v_component_of_wind'].isel(level = 0)
df['10m_u_component_of_wind'] = df['10m_u_component_of_wind'].isel(level = 0)
df['total_precipitation_6hr'] = df['total_precipitation_6hr'].isel(level = 0)
df['toa_incident_solar_radiation'] = df['toa_incident_solar_radiation'].isel(level = 0)
df['year_progress_sin'] = df['year_progress_sin'].isel(level = 0, lat = 0, lon = 0)
df['year_progress_cos'] = df['year_progress_cos'].isel(level = 0, lat = 0, lon = 0)
df['day_progress_sin'] = df['day_progress_sin'].isel(level = 0, lat = 0)
df['day_progress_cos'] = df['day_progress_cos'].isel(level = 0, lat = 0)
df = df.drop_vars('batch')

Run this code after fetching data from CDS, this will act as the input values for prediction.

predictions = run_forward_jitted(
    rng=jax.random.PRNGKey(0),
    inputs=df,
    targets_template=train_targets * np.nan,
    forcings=train_forcings)

Dec 06 '23 06:12 abhinavyesss

import datetime
import math
import pandas as pd
import xarray

def getTrigVals(dt:datetime.datetime):

    DAY_IN_SECONDS = 24 * 60 * 60
    YEAR_IN_SECONDS = 365.25 * DAY_IN_SECONDS

    year_start = datetime.datetime(year = dt.year, month = 1, day = 1)
    seconds_since_year_start = (dt - year_start).total_seconds()
    seconds_since_midnight = (dt - dt.replace(hour = 0, minute = 0, second = 0, microsecond = 0)).total_seconds()

    year_progress = 2 * math.pi * seconds_since_year_start / YEAR_IN_SECONDS
    day_progress = 2 * math.pi * seconds_since_midnight / DAY_IN_SECONDS

    year_progress_sin = math.sin(year_progress)
    year_progress_cos = math.cos(year_progress)
    day_progress_sin = math.sin(day_progress)
    day_progress_cos = math.cos(day_progress)

    return year_progress_sin, year_progress_cos, day_progress_sin, day_progress_cos

atmos = xarray.open_dataset('atmospheric.nc', engine = 'scipy').to_dataframe()
single = xarray.open_dataset('single-level.nc', engine = 'scipy').to_dataframe()

atmoscols = ['geopotential', 'specific_humidity', 'temperature', 'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity']
atmos = atmos.rename(columns = {col:atmoscols[ind] for ind, col in enumerate(atmos.columns.values.tolist())})
singlecols = ['10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature', 'geopotential_at_surface', 'land_sea_mask', 'mean_sea_level_pressure', 'toa_incident_solar_radiation', 'total_precipitation']
single = single.rename(columns = {col:singlecols[ind] for ind, col in enumerate(single.columns.values.tolist())})

atmos = atmos.reset_index()
single = single.reset_index()

atmos = atmos[atmos['level'].isin([50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000])]
atmos = atmos[atmos['latitude'].isin([i for i in range(-90, 91, 1)])]
atmos = atmos[atmos['longitude'].isin([i for i in range(360)])]

single = single[single['latitude'].isin([i for i in range(-90, 91, 1)])]
single = single[single['longitude'].isin([i for i in range(360)])]
single['year_progress_sin'], single['year_progress_cos'], single['day_progress_sin'], single['day_progress_cos'] = zip(*single['time'].map(lambda x:getTrigVals(x)))

df = pd.merge(atmos, single, how = 'left', on = ['latitude', 'longitude', 'time'])
df = df.rename(columns = {'latitude': 'lat', 'longitude': 'lon'})
df['time'] = df['time'].replace({pd.Timestamp('2023-11-01 00:00:00'): pd.Timedelta('-1 days +18:00:00'), pd.Timestamp('2023-11-01 06:00:00'): pd.Timedelta('00:00:00')})
df['batch'] = 1
df = df.set_index(['lat', 'lon', 'time', 'level', 'batch'])
df['total_precipitation_6hr'] = df['total_precipitation']
df.pop('total_precipitation')

df = df.to_xarray()
df['geopotential_at_surface'] = df['geopotential_at_surface'].isel(batch = 0, time = 0, level = 0)
df['land_sea_mask'] = df['land_sea_mask'].isel(batch = 0, time = 0, level = 0)
df['2m_temperature'] = df['2m_temperature'].isel(level = 0)
df['mean_sea_level_pressure'] = df['mean_sea_level_pressure'].isel(level = 0)
df['10m_v_component_of_wind'] = df['10m_v_component_of_wind'].isel(level = 0)
df['10m_u_component_of_wind'] = df['10m_u_component_of_wind'].isel(level = 0)
df['total_precipitation_6hr'] = df['total_precipitation_6hr'].isel(level = 0)
df['toa_incident_solar_radiation'] = df['toa_incident_solar_radiation'].isel(level = 0)
df['year_progress_sin'] = df['year_progress_sin'].isel(level = 0, lat = 0, lon = 0)
df['year_progress_cos'] = df['year_progress_cos'].isel(level = 0, lat = 0, lon = 0)
df['day_progress_sin'] = df['day_progress_sin'].isel(level = 0, lat = 0)
df['day_progress_cos'] = df['day_progress_cos'].isel(level = 0, lat = 0)
df = df.drop_vars('batch')

Run this code after fetching data from CDS, this will act as the input values for prediction.

predictions = run_forward_jitted(
    rng=jax.random.PRNGKey(0),
    inputs=df,
    targets_template=train_targets * np.nan,
    forcings=train_forcings)

how could i get the "train_targets" and "train_forcings"?

Thanks

Dec 06 '23 10:12 oubahe

import datetime
import math
import pandas as pd
import xarray

def getTrigVals(dt:datetime.datetime):

    DAY_IN_SECONDS = 24 * 60 * 60
    YEAR_IN_SECONDS = 365.25 * DAY_IN_SECONDS

    year_start = datetime.datetime(year = dt.year, month = 1, day = 1)
    seconds_since_year_start = (dt - year_start).total_seconds()
    seconds_since_midnight = (dt - dt.replace(hour = 0, minute = 0, second = 0, microsecond = 0)).total_seconds()

    year_progress = 2 * math.pi * seconds_since_year_start / YEAR_IN_SECONDS
    day_progress = 2 * math.pi * seconds_since_midnight / DAY_IN_SECONDS

    year_progress_sin = math.sin(year_progress)
    year_progress_cos = math.cos(year_progress)
    day_progress_sin = math.sin(day_progress)
    day_progress_cos = math.cos(day_progress)

    return year_progress_sin, year_progress_cos, day_progress_sin, day_progress_cos

atmos = xarray.open_dataset('atmospheric.nc', engine = 'scipy').to_dataframe()
single = xarray.open_dataset('single-level.nc', engine = 'scipy').to_dataframe()

atmoscols = ['geopotential', 'specific_humidity', 'temperature', 'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity']
atmos = atmos.rename(columns = {col:atmoscols[ind] for ind, col in enumerate(atmos.columns.values.tolist())})
singlecols = ['10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature', 'geopotential_at_surface', 'land_sea_mask', 'mean_sea_level_pressure', 'toa_incident_solar_radiation', 'total_precipitation']
single = single.rename(columns = {col:singlecols[ind] for ind, col in enumerate(single.columns.values.tolist())})

atmos = atmos.reset_index()
single = single.reset_index()

atmos = atmos[atmos['level'].isin([50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000])]
atmos = atmos[atmos['latitude'].isin([i for i in range(-90, 91, 1)])]
atmos = atmos[atmos['longitude'].isin([i for i in range(360)])]

single = single[single['latitude'].isin([i for i in range(-90, 91, 1)])]
single = single[single['longitude'].isin([i for i in range(360)])]
single['year_progress_sin'], single['year_progress_cos'], single['day_progress_sin'], single['day_progress_cos'] = zip(*single['time'].map(lambda x:getTrigVals(x)))

df = pd.merge(atmos, single, how = 'left', on = ['latitude', 'longitude', 'time'])
df = df.rename(columns = {'latitude': 'lat', 'longitude': 'lon'})
df['time'] = df['time'].replace({pd.Timestamp('2023-11-01 00:00:00'): pd.Timedelta('-1 days +18:00:00'), pd.Timestamp('2023-11-01 06:00:00'): pd.Timedelta('00:00:00')})
df['batch'] = 1
df = df.set_index(['lat', 'lon', 'time', 'level', 'batch'])
df['total_precipitation_6hr'] = df['total_precipitation']
df.pop('total_precipitation')

df = df.to_xarray()
df['geopotential_at_surface'] = df['geopotential_at_surface'].isel(batch = 0, time = 0, level = 0)
df['land_sea_mask'] = df['land_sea_mask'].isel(batch = 0, time = 0, level = 0)
df['2m_temperature'] = df['2m_temperature'].isel(level = 0)
df['mean_sea_level_pressure'] = df['mean_sea_level_pressure'].isel(level = 0)
df['10m_v_component_of_wind'] = df['10m_v_component_of_wind'].isel(level = 0)
df['10m_u_component_of_wind'] = df['10m_u_component_of_wind'].isel(level = 0)
df['total_precipitation_6hr'] = df['total_precipitation_6hr'].isel(level = 0)
df['toa_incident_solar_radiation'] = df['toa_incident_solar_radiation'].isel(level = 0)
df['year_progress_sin'] = df['year_progress_sin'].isel(level = 0, lat = 0, lon = 0)
df['year_progress_cos'] = df['year_progress_cos'].isel(level = 0, lat = 0, lon = 0)
df['day_progress_sin'] = df['day_progress_sin'].isel(level = 0, lat = 0)
df['day_progress_cos'] = df['day_progress_cos'].isel(level = 0, lat = 0)
df = df.drop_vars('batch')

Run this code after fetching data from CDS, this will act as the input values for prediction.

predictions = run_forward_jitted(
    rng=jax.random.PRNGKey(0),
    inputs=df,
    targets_template=train_targets * np.nan,
    forcings=train_forcings)

how could i get the "train_targets" and "train_forcings"?

Thanks

You can get both the values from ERA5 dataset. For targets you need to create a dataset of what the output should look like, it is just a empty dataset, which is why the colab demo code has "train_targets * np.nan".

As for forcings, you need the day progress and year progress of all the coordinates in your input. It can be calculated using data_utils from the graphcast library. As for the total incident radiation, you can either use the pysolar library for it or the ERA5 data, if it is available for the dates you want.

Dec 07 '23 09:12 abhinavyesss

As for forcings, you need the day progress and year progress of all the coordinates in your input. It can be calculated using data_utils from the graphcast library. As for the total incident radiation, you can either use the pysolar library for it or the ERA5 data, if it is available for the dates you want.

As someone who is relatively new to xarray could you ELI5 how to create the datetime coordinate and add it to the dataset because I seem to be struggling unnecessarily with it.

Dec 14 '23 02:12 Stoby200

Have you solved the problem of running the source code?

Dec 14 '23 08:12 VermicelliPencil

@Shadowwomen

Dec 14 '23 08:12 VermicelliPencil

As someone who is relatively new to xarray could you ELI5 how to create the datetime coordinate and add it to the dataset because I seem to be struggling unnecessarily with it.

Basically when you want to predict values using graphcast, you need three kinds of data: input, target and forcings. Now if you are calculating predictions for x hours, then the input needs to be for x - 6 and x - 12 hours.

Data for x - 6 hours and x - 12 hours can be found from CDS. You can check out their documentation for fetching data using their APIs, its pretty simple. The values of these two hours will be under the datetime field, something like datetime.datetime(2023, 1, 1, 0, 0, 0) and datetime.datetime(2023, 1, 1, 6, 0, 0), for example. Similarly you will have another datetime object for the target(s).

So after all the data is assembled, input + target, in case you are using a pandas dataframe, you will have to set the index to lat, lon, time, level and batch (you will have to add level manually). However if you are using an xarray, you will have to rename the 'time' field to 'datetime'.

Dec 14 '23 18:12 abhinavyesss

Basically when you want to predict values using graphcast, you need three kinds of data: input, target and forcings. Now if you are calculating predictions for x hours, then the input needs to be for x - 6 and x - 12 hours.

Data for x - 6 hours and x - 12 hours can be found from CDS. You can check out their documentation for fetching data using their APIs, its pretty simple. The values of these two hours will be under the datetime field, something like datetime.datetime(2023, 1, 1, 0, 0, 0) and datetime.datetime(2023, 1, 1, 6, 0, 0), for example. Similarly you will have another datetime object for the target(s).

So after all the data is assembled, input + target, in case you are using a pandas dataframe, you will have to set the index to lat, lon, time, level and batch (you will have to add level manually). However if you are using an xarray, you will have to rename the 'time' field to 'datetime'.

I have the CDS data and running the code that @abhinavyesss kindly provided as is gives me an xarray dataset that looks like this: The dataset produced from the example data, however, looks like this, with the extra time dimension, the datetime coordinate, and without the four time progress variables. In my mind if I can get the CDS data into the same format as the example data then I should be able to use that as example_batch, but given @abhinavyesss also rewrote the predictions section suggests I am misunderstanding something. When I use the altered CDS data it tells me it can't find the datetime variable when it attempts to remove it (because it breaks the autoregressive rollouts apparently), which makes sense as there isn't one, so I'm trying to figure out how to add it. I'm guessing the forcing data needs datetime but the input doesn't so maybe I need two datasets, one with datetime and one without? Or maybe if I can add datetime I can use that as example_batch and the source code deals with dropping it for the input data? Many thanks for your help.

I should also note that once I get the CDS data working I intend to use GFS operational data to get it to produce actual forecasts, I'm just using the CDS data to try and understand how everything works.

Dec 14 '23 20:12 Stoby200

Basically when you want to predict values using graphcast, you need three kinds of data: input, target and forcings. Now if you are calculating predictions for x hours, then the input needs to be for x - 6 and x - 12 hours. Data for x - 6 hours and x - 12 hours can be found from CDS. You can check out their documentation for fetching data using their APIs, its pretty simple. The values of these two hours will be under the datetime field, something like datetime.datetime(2023, 1, 1, 0, 0, 0) and datetime.datetime(2023, 1, 1, 6, 0, 0), for example. Similarly you will have another datetime object for the target(s). So after all the data is assembled, input + target, in case you are using a pandas dataframe, you will have to set the index to lat, lon, time, level and batch (you will have to add level manually). However if you are using an xarray, you will have to rename the 'time' field to 'datetime'.

I have the CDS data and running the code that @abhinavyesss kindly provided as is gives me an xarray dataset that looks like this: The dataset produced from the example data, however, looks like this, with the extra time dimension, the datetime coordinate, and without the four time progress variables. In my mind if I can get the CDS data into the same format as the example data then I should be able to use that as example_batch, but given @abhinavyesss also rewrote the predictions section suggests I am misunderstanding something. When I use the altered CDS data it tells me it can't find the datetime variable when it attempts to remove it (because it breaks the autoregressive rollouts apparently), which makes sense as there isn't one, so I'm trying to figure out how to add it. I'm guessing the forcing data needs datetime but the input doesn't so maybe I need two datasets, one with datetime and one without? Or maybe if I can add datetime I can use that as example_batch and the source code deals with dropping it for the input data? Many thanks for your help.

I should also note that once I get the CDS data working I intend to use GFS operational data to get it to produce actual forecasts, I'm just using the CDS data to try and understand how everything works.

Hello, I would like to know how to output the prediction results after inputting the data using graphcast's source code

Dec 19 '23 08:12 RedOrange001

import datetime
import math
import pandas as pd
import xarray

def getTrigVals(dt:datetime.datetime):

    DAY_IN_SECONDS = 24 * 60 * 60
    YEAR_IN_SECONDS = 365.25 * DAY_IN_SECONDS

    year_start = datetime.datetime(year = dt.year, month = 1, day = 1)
    seconds_since_year_start = (dt - year_start).total_seconds()
    seconds_since_midnight = (dt - dt.replace(hour = 0, minute = 0, second = 0, microsecond = 0)).total_seconds()

    year_progress = 2 * math.pi * seconds_since_year_start / YEAR_IN_SECONDS
    day_progress = 2 * math.pi * seconds_since_midnight / DAY_IN_SECONDS

    year_progress_sin = math.sin(year_progress)
    year_progress_cos = math.cos(year_progress)
    day_progress_sin = math.sin(day_progress)
    day_progress_cos = math.cos(day_progress)

    return year_progress_sin, year_progress_cos, day_progress_sin, day_progress_cos

atmos = xarray.open_dataset('atmospheric.nc', engine = 'scipy').to_dataframe()
single = xarray.open_dataset('single-level.nc', engine = 'scipy').to_dataframe()

atmoscols = ['geopotential', 'specific_humidity', 'temperature', 'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity']
atmos = atmos.rename(columns = {col:atmoscols[ind] for ind, col in enumerate(atmos.columns.values.tolist())})
singlecols = ['10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature', 'geopotential_at_surface', 'land_sea_mask', 'mean_sea_level_pressure', 'toa_incident_solar_radiation', 'total_precipitation']
single = single.rename(columns = {col:singlecols[ind] for ind, col in enumerate(single.columns.values.tolist())})

atmos = atmos.reset_index()
single = single.reset_index()

atmos = atmos[atmos['level'].isin([50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000])]
atmos = atmos[atmos['latitude'].isin([i for i in range(-90, 91, 1)])]
atmos = atmos[atmos['longitude'].isin([i for i in range(360)])]

single = single[single['latitude'].isin([i for i in range(-90, 91, 1)])]
single = single[single['longitude'].isin([i for i in range(360)])]
single['year_progress_sin'], single['year_progress_cos'], single['day_progress_sin'], single['day_progress_cos'] = zip(*single['time'].map(lambda x:getTrigVals(x)))

df = pd.merge(atmos, single, how = 'left', on = ['latitude', 'longitude', 'time'])
df = df.rename(columns = {'latitude': 'lat', 'longitude': 'lon'})
df['time'] = df['time'].replace({pd.Timestamp('2023-11-01 00:00:00'): pd.Timedelta('-1 days +18:00:00'), pd.Timestamp('2023-11-01 06:00:00'): pd.Timedelta('00:00:00')})
df['batch'] = 1
df = df.set_index(['lat', 'lon', 'time', 'level', 'batch'])
df['total_precipitation_6hr'] = df['total_precipitation']
df.pop('total_precipitation')

df = df.to_xarray()
df['geopotential_at_surface'] = df['geopotential_at_surface'].isel(batch = 0, time = 0, level = 0)
df['land_sea_mask'] = df['land_sea_mask'].isel(batch = 0, time = 0, level = 0)
df['2m_temperature'] = df['2m_temperature'].isel(level = 0)
df['mean_sea_level_pressure'] = df['mean_sea_level_pressure'].isel(level = 0)
df['10m_v_component_of_wind'] = df['10m_v_component_of_wind'].isel(level = 0)
df['10m_u_component_of_wind'] = df['10m_u_component_of_wind'].isel(level = 0)
df['total_precipitation_6hr'] = df['total_precipitation_6hr'].isel(level = 0)
df['toa_incident_solar_radiation'] = df['toa_incident_solar_radiation'].isel(level = 0)
df['year_progress_sin'] = df['year_progress_sin'].isel(level = 0, lat = 0, lon = 0)
df['year_progress_cos'] = df['year_progress_cos'].isel(level = 0, lat = 0, lon = 0)
df['day_progress_sin'] = df['day_progress_sin'].isel(level = 0, lat = 0)
df['day_progress_cos'] = df['day_progress_cos'].isel(level = 0, lat = 0)
df = df.drop_vars('batch')

Run this code after fetching data from CDS, this will act as the input values for prediction.

predictions = run_forward_jitted(
    rng=jax.random.PRNGKey(0),
    inputs=df,
    targets_template=train_targets * np.nan,
    forcings=train_forcings)

how could i get the "train_targets" and "train_forcings"? Thanks

You can get both the values from ERA5 dataset. For targets you need to create a dataset of what the output should look like, it is just a empty dataset, which is why the colab demo code has "train_targets * np.nan".

As for forcings, you need the day progress and year progress of all the coordinates in your input. It can be calculated using data_utils from the graphcast library. As for the total incident radiation, you can either use the pysolar library for it or the ERA5 data, if it is available for the dates you want.

After inputting the data using the above method, how to output the prediction results using graphcast's source code

Dec 19 '23 08:12 RedOrange001

import datetime
import math
import pandas as pd
import xarray

def getTrigVals(dt:datetime.datetime):

    DAY_IN_SECONDS = 24 * 60 * 60
    YEAR_IN_SECONDS = 365.25 * DAY_IN_SECONDS

    year_start = datetime.datetime(year = dt.year, month = 1, day = 1)
    seconds_since_year_start = (dt - year_start).total_seconds()
    seconds_since_midnight = (dt - dt.replace(hour = 0, minute = 0, second = 0, microsecond = 0)).total_seconds()

    year_progress = 2 * math.pi * seconds_since_year_start / YEAR_IN_SECONDS
    day_progress = 2 * math.pi * seconds_since_midnight / DAY_IN_SECONDS

    year_progress_sin = math.sin(year_progress)
    year_progress_cos = math.cos(year_progress)
    day_progress_sin = math.sin(day_progress)
    day_progress_cos = math.cos(day_progress)

    return year_progress_sin, year_progress_cos, day_progress_sin, day_progress_cos

atmos = xarray.open_dataset('atmospheric.nc', engine = 'scipy').to_dataframe()
single = xarray.open_dataset('single-level.nc', engine = 'scipy').to_dataframe()

atmoscols = ['geopotential', 'specific_humidity', 'temperature', 'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity']
atmos = atmos.rename(columns = {col:atmoscols[ind] for ind, col in enumerate(atmos.columns.values.tolist())})
singlecols = ['10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature', 'geopotential_at_surface', 'land_sea_mask', 'mean_sea_level_pressure', 'toa_incident_solar_radiation', 'total_precipitation']
single = single.rename(columns = {col:singlecols[ind] for ind, col in enumerate(single.columns.values.tolist())})

atmos = atmos.reset_index()
single = single.reset_index()

atmos = atmos[atmos['level'].isin([50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000])]
atmos = atmos[atmos['latitude'].isin([i for i in range(-90, 91, 1)])]
atmos = atmos[atmos['longitude'].isin([i for i in range(360)])]

single = single[single['latitude'].isin([i for i in range(-90, 91, 1)])]
single = single[single['longitude'].isin([i for i in range(360)])]
single['year_progress_sin'], single['year_progress_cos'], single['day_progress_sin'], single['day_progress_cos'] = zip(*single['time'].map(lambda x:getTrigVals(x)))

df = pd.merge(atmos, single, how = 'left', on = ['latitude', 'longitude', 'time'])
df = df.rename(columns = {'latitude': 'lat', 'longitude': 'lon'})
df['time'] = df['time'].replace({pd.Timestamp('2023-11-01 00:00:00'): pd.Timedelta('-1 days +18:00:00'), pd.Timestamp('2023-11-01 06:00:00'): pd.Timedelta('00:00:00')})
df['batch'] = 1
df = df.set_index(['lat', 'lon', 'time', 'level', 'batch'])
df['total_precipitation_6hr'] = df['total_precipitation']
df.pop('total_precipitation')

df = df.to_xarray()
df['geopotential_at_surface'] = df['geopotential_at_surface'].isel(batch = 0, time = 0, level = 0)
df['land_sea_mask'] = df['land_sea_mask'].isel(batch = 0, time = 0, level = 0)
df['2m_temperature'] = df['2m_temperature'].isel(level = 0)
df['mean_sea_level_pressure'] = df['mean_sea_level_pressure'].isel(level = 0)
df['10m_v_component_of_wind'] = df['10m_v_component_of_wind'].isel(level = 0)
df['10m_u_component_of_wind'] = df['10m_u_component_of_wind'].isel(level = 0)
df['total_precipitation_6hr'] = df['total_precipitation_6hr'].isel(level = 0)
df['toa_incident_solar_radiation'] = df['toa_incident_solar_radiation'].isel(level = 0)
df['year_progress_sin'] = df['year_progress_sin'].isel(level = 0, lat = 0, lon = 0)
df['year_progress_cos'] = df['year_progress_cos'].isel(level = 0, lat = 0, lon = 0)
df['day_progress_sin'] = df['day_progress_sin'].isel(level = 0, lat = 0)
df['day_progress_cos'] = df['day_progress_cos'].isel(level = 0, lat = 0)
df = df.drop_vars('batch')

Run this code after fetching data from CDS, this will act as the input values for prediction.

predictions = run_forward_jitted(
    rng=jax.random.PRNGKey(0),
    inputs=df,
    targets_template=train_targets * np.nan,
    forcings=train_forcings)

how could i get the "train_targets" and "train_forcings"? Thanks

You can get both the values from ERA5 dataset. For targets you need to create a dataset of what the output should look like, it is just a empty dataset, which is why the colab demo code has "train_targets * np.nan".

As for forcings, you need the day progress and year progress of all the coordinates in your input. It can be calculated using data_utils from the graphcast library. As for the total incident radiation, you can either use the pysolar library for it or the ERA5 data, if it is available for the dates you want.

Hi, Thanks for sharing the code. I am trying this code. I am getting below error and unable to resolve it. /usr/local/lib/python3.10/dist-packages/jax/_src/lax/lax.py in _concatenate_shape_rule(*operands, **kwargs) 3166 "dimension {} for shapes {}.") 3167 shapes = [operand.shape for operand in operands] -> 3168 raise TypeError(msg.format(dimension, ", ".join(map(str, shapes)))) 3169 3170 concat_size = sum(o.shape[dimension] for o in operands)

TypeError: Cannot concatenate arrays with shapes that differ in dimensions other than the one being concatenated: concatenating along dimension 2 for shapes (1038240, 1, 183), (65160, 1, 3).

What should I do?

Dec 20 '23 06:12 hydroinfo-prasad

After inputting the data using the above method, how to output the prediction results using graphcast's source code

If your input is correctly formatted, then the predictions will be stored in the 'predictions' variable and can be output by accessing it.

predictions = run_forward_jitted(
    rng=jax.random.PRNGKey(0),
    inputs=df,
    targets_template=train_targets * np.nan,
    forcings=train_forcings)

Dec 20 '23 18:12 abhinavyesss

I have the CDS data and running the code that @abhinavyesss kindly provided as is gives me an xarray dataset that looks like this: The dataset produced from the example data, however, looks like this, with the extra time dimension, the datetime coordinate, and without the four time progress variables. In my mind if I can get the CDS data into the same format as the example data then I should be able to use that as example_batch, but given @abhinavyesss also rewrote the predictions section suggests I am misunderstanding something. When I use the altered CDS data it tells me it can't find the datetime variable when it attempts to remove it (because it breaks the autoregressive rollouts apparently), which makes sense as there isn't one, so I'm trying to figure out how to add it. I'm guessing the forcing data needs datetime but the input doesn't so maybe I need two datasets, one with datetime and one without? Or maybe if I can add datetime I can use that as example_batch and the source code deals with dropping it for the input data? Many thanks for your help.

I should also note that once I get the CDS data working I intend to use GFS operational data to get it to produce actual forecasts, I'm just using the CDS data to try and understand how everything works.

The example_batch in the graphcast demo contains 3 datetime variables, two inputs and one for prediction. This example batch is then passed to the 'data_utils.extract_inputs_targets_forcings', a graphcast function, in order to get the three sets of data. In my example, I have skipped that function because in nearly every real-time use case, you won't have the prediction data, since it has not technically occurred.

I fetch the data from CDS using its APIs, create the target data manually and I get the forcing data from pysolar (total solar radiation) and graphcast libraries (for day and year progress).

The data I created can then be used directly when fetching predictions, hence the datetime variable is not required. Also, from my implementation I have noticed the 'time' coordinate can be a datetime.datetime object too, a timedelta object is not necessary.

Dec 20 '23 19:12 abhinavyesss

After inputting the data using the above method, how to output the prediction results using graphcast's source code

If your input is correctly formatted, then the predictions will be stored in the 'predictions' variable and can be output by accessing it.
predictions = run_forward_jitted(
    rng=jax.random.PRNGKey(0),
    inputs=df,
    targets_template=train_targets * np.nan,
    forcings=train_forcings)

Thanks for your reply! I wonder where should I put the following codes to get the result.

> 
> ```
> predictions = run_forward_jitted(
>     rng=jax.random.PRNGKey(0),
>     inputs=df,
>     targets_template=train_targets * np.nan,
>     forcings=train_forcings)
> ```

Dec 21 '23 01:12 RedOrange001

I have the CDS data and running the code that @abhinavyesss kindly provided as is gives me an xarray dataset that looks like this: The dataset produced from the example data, however, looks like this, with the extra time dimension, the datetime coordinate, and without the four time progress variables. In my mind if I can get the CDS data into the same format as the example data then I should be able to use that as example_batch, but given @abhinavyesss also rewrote the predictions section suggests I am misunderstanding something. When I use the altered CDS data it tells me it can't find the datetime variable when it attempts to remove it (because it breaks the autoregressive rollouts apparently), which makes sense as there isn't one, so I'm trying to figure out how to add it. I'm guessing the forcing data needs datetime but the input doesn't so maybe I need two datasets, one with datetime and one without? Or maybe if I can add datetime I can use that as example_batch and the source code deals with dropping it for the input data? Many thanks for your help. I should also note that once I get the CDS data working I intend to use GFS operational data to get it to produce actual forecasts, I'm just using the CDS data to try and understand how everything works.

The example_batch in the graphcast demo contains 3 datetime variables, two inputs and one for prediction. This example batch is then passed to the 'data_utils.extract_inputs_targets_forcings', a graphcast function, in order to get the three sets of data. In my example, I have skipped that function because in nearly every real-time use case, you won't have the prediction data, since it has not technically occurred.

I fetch the data from CDS using its APIs, create the target data manually and I get the forcing data from pysolar (total solar radiation) and graphcast libraries (for day and year progress).

The data I created can then be used directly when fetching predictions, hence the datetime variable is not required. Also, from my implementation I have noticed the 'time' coordinate can be a datetime.datetime object too, a timedelta object is not necessary.

Hello,I wonder how to create the target data manually and get the forcing data, could you please show me your domeo code, that will be a great help to me!!

Dec 21 '23 06:12 RedOrange001

After inputting the data using the above method, how to output the prediction results using graphcast's source code

If your input is correctly formatted, then the predictions will be stored in the 'predictions' variable and can be output by accessing it.
predictions = run_forward_jitted(
    rng=jax.random.PRNGKey(0),
    inputs=df,
    targets_template=train_targets * np.nan,
    forcings=train_forcings)
Thanks for your reply! I wonder where should I put the following codes to get the result.
> 
> ```
> predictions = run_forward_jitted(
>     rng=jax.random.PRNGKey(0),
>     inputs=df,
>     targets_template=train_targets * np.nan,
>     forcings=train_forcings)
> ```

This code is placed after the input, target and forcing data is fetched.

Dec 21 '23 07:12 abhinavyesss

Hello,I wonder how to create the target data manually and get the forcing data, could you please show me your domeo code, that will be a great help to me!!

@classmethod
def formatData(cls, data:uti.dataframe, includeLevel = False) -> uti.dataframe:
    
    data['batch'] = 0
    data = data.rename(columns = {'latitude': 'lat, 'longitude': 'lon'})
    data = data.set_index(['lat', 'lon', 'time', 'batch', 'level'])
    
    return data

@classmethod
def getTargets(cls, dt, data:uti.dataframe):

    dt, targetdf = uti.toDatetime(dt), uti.dataframe()
    data = data.reset_index()
    
    for days in range(6): # Predicting only for 6 dates.
        templatedf:uti.dataframe = data[data['time'] == data['time'].tolist()[0]]
        templatedf.loc[:, 'time'] = uti.deltaTime(dt, hours = days * 6)
        targetdf = pd.concat([targetdf, templatedf], ignore_index = True)

    targetdf = cls.formatData(targetdf, True)[Fields().getPredictionFields()]
    targetdf = targetdf * uti.nan()

    return targetdf

What I do is, take a subset of the input data i.e., one of the two unique 'time' values. This contains all the fields that will be required for prediction. Then I remove the fields that will not be used for prediction.

For forcing data, you have to have to calculate the solar radiation (use pysolar package) and the day and year progress values for every coordinate and every datetime you want to predict for. Forcing data is not dependent on pressure-level, so you will have to drop that variable.

from graphcast import data_utils as du

@classmethod
def addYearProgress(cls, secs, data):

    progress = du.get_year_progress(secs)
    data[Fields.EngineeredField.YEAR_PROGRESS_SIN.value] = muti.sin(progress)
    data[Fields.EngineeredField.YEAR_PROGRESS_COS.value] = muti.cos(progress)

    return data

@classmethod
def addDayProgress(cls, secs, lon:str, data:uti.dataframe):

    progress:np.ndarray = du.get_day_progress(secs, uti.getNumpyArray(data[lon].unique()))
    prxlon = {lon:prog for lon, prog in list(zip(list(data[lon].unique()), progress.tolist()))}
    data[Fields.EngineeredField.DAY_PROGRESS_SIN.value] = data[lon].map(lambda x:muti.sin(prxlon[x]))
    data[Fields.EngineeredField.DAY_PROGRESS_COS.value] = data[lon].map(lambda x:muti.cos(prxlon[x]))
    
    return data

@classmethod
def getForcings(cls, data:uti.dataframe):

    forcingdf = data.reset_index().drop(labels = Fields().getPredictionFields(), axis = 1)
    forcingdf = forcingdf.drop(labels = ['level'], axis = 1).drop_duplicates(subset = ['lon', 'lat', 'time'], keep = 'first')
    forcingdf = cls.integrateProgress(forcingdf)
    forcingdf = cls.integrateSolarRadiation(forcingdf) # Use pysolar.
    forcingdf = cls.formatData(forcingdf)

    return forcingdf

Dec 21 '23 08:12 abhinavyesss

Hello,I wonder how to create the target data manually and get the forcing data, could you please show me your domeo code, that will be a great help to me!!

@classmethod
def formatData(cls, data:uti.dataframe, includeLevel = False) -> uti.dataframe:
    
    data['batch'] = 0
    data = data.rename(columns = {'latitude': 'lat, 'longitude': 'lon'})
    data = data.set_index(['lat', 'lon', 'time', 'batch', 'level'])
    
    return data

@classmethod
def getTargets(cls, dt, data:uti.dataframe):

    dt, targetdf = uti.toDatetime(dt), uti.dataframe()
    data = data.reset_index()
    
    for days in range(6): # Predicting only for 6 dates.
        templatedf:uti.dataframe = data[data['time'] == data['time'].tolist()[0]]
        templatedf.loc[:, 'time'] = uti.deltaTime(dt, hours = days * 6)
        targetdf = pd.concat([targetdf, templatedf], ignore_index = True)

    targetdf = cls.formatData(targetdf, True)[Fields().getPredictionFields()]
    targetdf = targetdf * uti.nan()

    return targetdf

What I do is, take a subset of the input data i.e., one of the two unique 'time' values. This contains all the fields that will be required for prediction. Then I remove the fields that will not be used for prediction.

For forcing data, you have to have to calculate the solar radiation (use pysolar package) and the day and year progress values for every coordinate and every datetime you want to predict for. Forcing data is not dependent on pressure-level, so you will have to drop that variable.

from graphcast import data_utils as du

@classmethod
def addYearProgress(cls, secs, data):

    progress = du.get_year_progress(secs)
    data[Fields.EngineeredField.YEAR_PROGRESS_SIN.value] = muti.sin(progress)
    data[Fields.EngineeredField.YEAR_PROGRESS_COS.value] = muti.cos(progress)

    return data

@classmethod
def addDayProgress(cls, secs, lon:str, data:uti.dataframe):

    progress:np.ndarray = du.get_day_progress(secs, uti.getNumpyArray(data[lon].unique()))
    prxlon = {lon:prog for lon, prog in list(zip(list(data[lon].unique()), progress.tolist()))}
    data[Fields.EngineeredField.DAY_PROGRESS_SIN.value] = data[lon].map(lambda x:muti.sin(prxlon[x]))
    data[Fields.EngineeredField.DAY_PROGRESS_COS.value] = data[lon].map(lambda x:muti.cos(prxlon[x]))
    
    return data

@classmethod
def getForcings(cls, data:uti.dataframe):

    forcingdf = data.reset_index().drop(labels = Fields().getPredictionFields(), axis = 1)
    forcingdf = forcingdf.drop(labels = ['level'], axis = 1).drop_duplicates(subset = ['lon', 'lat', 'time'], keep = 'first')
    forcingdf = cls.integrateProgress(forcingdf)
    forcingdf = cls.integrateSolarRadiation(forcingdf) # Use pysolar.
    forcingdf = cls.formatData(forcingdf)

    return forcingdf

Thank you very much for your help, I will try the code you provided

Dec 21 '23 08:12 RedOrange001

Hello,I wonder how to create the target data manually and get the forcing data, could you please show me your domeo code, that will be a great help to me!!

@classmethod
def formatData(cls, data:uti.dataframe, includeLevel = False) -> uti.dataframe:
    
    data['batch'] = 0
    data = data.rename(columns = {'latitude': 'lat, 'longitude': 'lon'})
    data = data.set_index(['lat', 'lon', 'time', 'batch', 'level'])
    
    return data

@classmethod
def getTargets(cls, dt, data:uti.dataframe):

    dt, targetdf = uti.toDatetime(dt), uti.dataframe()
    data = data.reset_index()
    
    for days in range(6): # Predicting only for 6 dates.
        templatedf:uti.dataframe = data[data['time'] == data['time'].tolist()[0]]
        templatedf.loc[:, 'time'] = uti.deltaTime(dt, hours = days * 6)
        targetdf = pd.concat([targetdf, templatedf], ignore_index = True)

    targetdf = cls.formatData(targetdf, True)[Fields().getPredictionFields()]
    targetdf = targetdf * uti.nan()

    return targetdf

What I do is, take a subset of the input data i.e., one of the two unique 'time' values. This contains all the fields that will be required for prediction. Then I remove the fields that will not be used for prediction.

For forcing data, you have to have to calculate the solar radiation (use pysolar package) and the day and year progress values for every coordinate and every datetime you want to predict for. Forcing data is not dependent on pressure-level, so you will have to drop that variable.

from graphcast import data_utils as du

@classmethod
def addYearProgress(cls, secs, data):

    progress = du.get_year_progress(secs)
    data[Fields.EngineeredField.YEAR_PROGRESS_SIN.value] = muti.sin(progress)
    data[Fields.EngineeredField.YEAR_PROGRESS_COS.value] = muti.cos(progress)

    return data

@classmethod
def addDayProgress(cls, secs, lon:str, data:uti.dataframe):

    progress:np.ndarray = du.get_day_progress(secs, uti.getNumpyArray(data[lon].unique()))
    prxlon = {lon:prog for lon, prog in list(zip(list(data[lon].unique()), progress.tolist()))}
    data[Fields.EngineeredField.DAY_PROGRESS_SIN.value] = data[lon].map(lambda x:muti.sin(prxlon[x]))
    data[Fields.EngineeredField.DAY_PROGRESS_COS.value] = data[lon].map(lambda x:muti.cos(prxlon[x]))
    
    return data

@classmethod
def getForcings(cls, data:uti.dataframe):

    forcingdf = data.reset_index().drop(labels = Fields().getPredictionFields(), axis = 1)
    forcingdf = forcingdf.drop(labels = ['level'], axis = 1).drop_duplicates(subset = ['lon', 'lat', 'time'], keep = 'first')
    forcingdf = cls.integrateProgress(forcingdf)
    forcingdf = cls.integrateSolarRadiation(forcingdf) # Use pysolar.
    forcingdf = cls.formatData(forcingdf)

    return forcingdf

Hello, so the process of running graphcast locally is to get the data of t-6 and t-12, then create target and forcing data, and then call graphcast for output ? However, there are still many details to be dealt with in this process. As a beginner, it is difficult. I know that you have a good understanding of graphcast, and I hope you can give a code that can output the prediction results in a complete operation. Or you can record a video to upload youtube, parse the source code of graphcast, and thank you again for your work. Your work is of great significance to many beginners.

Dec 22 '23 03:12 RedOrange001

Hello, so the process of running graphcast locally is to get the data of t-6 and t-12, then create target and forcing data, and then call graphcast for output ? However, there are still many details to be dealt with in this process. As a beginner, it is difficult. I know that you have a good understanding of graphcast, and I hope you can give a code that can output the prediction results in a complete operation. Or you can record a video to upload youtube, parse the source code of graphcast, and thank you again for your work. Your work is of great significance to many beginners.

Hi, I am sorry about not being able to provide the most clearest answers to your queries. I do plan to write an article about my work on Graphcast, however that is still some time away. I am close to completing my experimentation with graphcast in a few days, post which I will upload the code, hopefully that will provide the answers to everything from data collection for inputs to getting predictions.

Dec 28 '23 22:12 abhinavyesss

Hello, so the process of running graphcast locally is to get the data of t-6 and t-12, then create target and forcing data, and then call graphcast for output ? However, there are still many details to be dealt with in this process. As a beginner, it is difficult. I know that you have a good understanding of graphcast, and I hope you can give a code that can output the prediction results in a complete operation. Or you can record a video to upload youtube, parse the source code of graphcast, and thank you again for your work. Your work is of great significance to many beginners.

Hi, I am sorry about not being able to provide the most clearest answers to your queries. I do plan to write an article about my work on Graphcast, however that is still some time away. I am close to completing my experimentation with graphcast in a few days, post which I will upload the code, hopefully that will provide the answers to everything from data collection for inputs to getting predictions.

I am looking forward to your furture work,It will be a great help to me ,thanks a lot!!

Dec 29 '23 00:12 RedOrange001

import cdsapi

c = cdsapi.Client()

c.retrieve(
    'reanalysis-era5-pressure-levels',
    {
        'product_type': 'reanalysis',
        'format': 'netcdf',
        'variable': [
            'geopotential', 'specific_humidity', 'temperature',
            'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity',
        ],
        'year': '2023',
        'month': '11',
        'day': '01',
        'time': [
            '00:00', '06:00',
        ],
        'pressure_level': [
            '1', '2', '3',
            '5', '7', '10',
            '20', '30', '50',
            '70', '100', '125',
            '150', '175', '200',
            '225', '250', '300',
            '350', '400', '450',
            '500', '550', '600',
            '650', '700', '750',
            '775', '800', '825',
            '850', '875', '900',
            '925', '950', '975',
            '1000',
        ],
    },
    'atmospheric.nc')

c.retrieve(
    'reanalysis-era5-single-levels',
    {
        'product_type': 'reanalysis',
        'variable': [
            '10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature',
            'geopotential', 'land_sea_mask', 'mean_sea_level_pressure',
            'toa_incident_solar_radiation', 'total_precipitation',
        ],
        'year': '2023',
        'month': '11',
        'day': '01',
        'time': [
            '00:00', '06:00',
        ],
        'format': 'netcdf',
    },
    'single-level.nc')

The code for fetching custom data for graphcast.

Dear teacher, I downloaded the nc data I needed according to your code. The data I downloaded is the temperature data of March 5, 2020, with pressure levels ranging from 10 to 1000. Then I upload the data to Google Cloud disk, may I ask you how I can use my data as input data in the demo code of Google colab to get the prediction result? Thank you very much.@abhinavyesss

Jan 04 '24 15:01 qwer9872

Dear teacher, hello. I noticed that there are two nc files in your code. single-level.nc is the data you are interested in obtained through CDS. May I ask what is the atmospheric.nc？How to get it? @abhinavyesss

Jan 05 '24 06:01 qwer9872

import datetime
import math
import pandas as pd
import xarray

def getTrigVals(dt:datetime.datetime):

    DAY_IN_SECONDS = 24 * 60 * 60
    YEAR_IN_SECONDS = 365.25 * DAY_IN_SECONDS

    year_start = datetime.datetime(year = dt.year, month = 1, day = 1)
    seconds_since_year_start = (dt - year_start).total_seconds()
    seconds_since_midnight = (dt - dt.replace(hour = 0, minute = 0, second = 0, microsecond = 0)).total_seconds()

    year_progress = 2 * math.pi * seconds_since_year_start / YEAR_IN_SECONDS
    day_progress = 2 * math.pi * seconds_since_midnight / DAY_IN_SECONDS

    year_progress_sin = math.sin(year_progress)
    year_progress_cos = math.cos(year_progress)
    day_progress_sin = math.sin(day_progress)
    day_progress_cos = math.cos(day_progress)

    return year_progress_sin, year_progress_cos, day_progress_sin, day_progress_cos

atmos = xarray.open_dataset('atmospheric.nc', engine = 'scipy').to_dataframe()
single = xarray.open_dataset('single-level.nc', engine = 'scipy').to_dataframe()

atmoscols = ['geopotential', 'specific_humidity', 'temperature', 'u_component_of_wind', 'v_component_of_wind', 'vertical_velocity']
atmos = atmos.rename(columns = {col:atmoscols[ind] for ind, col in enumerate(atmos.columns.values.tolist())})
singlecols = ['10m_u_component_of_wind', '10m_v_component_of_wind', '2m_temperature', 'geopotential_at_surface', 'land_sea_mask', 'mean_sea_level_pressure', 'toa_incident_solar_radiation', 'total_precipitation']
single = single.rename(columns = {col:singlecols[ind] for ind, col in enumerate(single.columns.values.tolist())})

atmos = atmos.reset_index()
single = single.reset_index()

atmos = atmos[atmos['level'].isin([50, 100, 150, 200, 250, 300, 400, 500, 600, 700, 850, 925, 1000])]
atmos = atmos[atmos['latitude'].isin([i for i in range(-90, 91, 1)])]
atmos = atmos[atmos['longitude'].isin([i for i in range(360)])]

single = single[single['latitude'].isin([i for i in range(-90, 91, 1)])]
single = single[single['longitude'].isin([i for i in range(360)])]
single['year_progress_sin'], single['year_progress_cos'], single['day_progress_sin'], single['day_progress_cos'] = zip(*single['time'].map(lambda x:getTrigVals(x)))

df = pd.merge(atmos, single, how = 'left', on = ['latitude', 'longitude', 'time'])
df = df.rename(columns = {'latitude': 'lat', 'longitude': 'lon'})
df['time'] = df['time'].replace({pd.Timestamp('2023-11-01 00:00:00'): pd.Timedelta('-1 days +18:00:00'), pd.Timestamp('2023-11-01 06:00:00'): pd.Timedelta('00:00:00')})
df['batch'] = 1
df = df.set_index(['lat', 'lon', 'time', 'level', 'batch'])
df['total_precipitation_6hr'] = df['total_precipitation']
df.pop('total_precipitation')

df = df.to_xarray()
df['geopotential_at_surface'] = df['geopotential_at_surface'].isel(batch = 0, time = 0, level = 0)
df['land_sea_mask'] = df['land_sea_mask'].isel(batch = 0, time = 0, level = 0)
df['2m_temperature'] = df['2m_temperature'].isel(level = 0)
df['mean_sea_level_pressure'] = df['mean_sea_level_pressure'].isel(level = 0)
df['10m_v_component_of_wind'] = df['10m_v_component_of_wind'].isel(level = 0)
df['10m_u_component_of_wind'] = df['10m_u_component_of_wind'].isel(level = 0)
df['total_precipitation_6hr'] = df['total_precipitation_6hr'].isel(level = 0)
df['toa_incident_solar_radiation'] = df['toa_incident_solar_radiation'].isel(level = 0)
df['year_progress_sin'] = df['year_progress_sin'].isel(level = 0, lat = 0, lon = 0)
df['year_progress_cos'] = df['year_progress_cos'].isel(level = 0, lat = 0, lon = 0)
df['day_progress_sin'] = df['day_progress_sin'].isel(level = 0, lat = 0)
df['day_progress_cos'] = df['day_progress_cos'].isel(level = 0, lat = 0)
df = df.drop_vars('batch')

Run this code after fetching data from CDS, this will act as the input values for prediction.

predictions = run_forward_jitted(
    rng=jax.random.PRNGKey(0),
    inputs=df,
    targets_template=train_targets * np.nan,
    forcings=train_forcings)

how could i get the "train_targets" and "train_forcings"?

Thanks Dear teacher, hello. Do you have the data you are interested in through CDS and then complete the forecast? How do I enter the data after I download it? Thank @oubahe

Jan 05 '24 06:01 qwer9872

Dear teacher, hello. I noticed that there are two nc files in your code. single-level.nc is the data you are interested in obtained through CDS. May I ask what is the atmospheric.nc？How to get it? @abhinavyesss

The input data for graphcast requires a lot of fields, some of those fields are independent of the pressure level, this data is obtained from the single level collection from CDS and saved under single-level.nc.

Some other fields are dependant on the pressure level, this data is fetched from a different collection of CDS, 'ERA5 hourly data on pressure levels from 1940 to present'. I stored this data in atmospheric.nc.

The code for obtaining data from both collections is quoted by you.

Jan 05 '24 13:01 abhinavyesss

Dear teacher, I downloaded the nc data I needed according to your code. The data I downloaded is the temperature data of March 5, 2020, with pressure levels ranging from 10 to 1000. Then I upload the data to Google Cloud disk, may I ask you how I can use my data as input data in the demo code of Google colab to get the prediction result? Thank you very much.@abhinavyesss

You have quoted the solution to this in your most recent comment.

However this is just the input data. In order to make the predictions you will also have to get the forcings and target data. For every hour you will be making the prediction for, you need to have the forcings and target values.

Jan 05 '24 13:01 abhinavyesss

Dear teacher, I downloaded the nc data I needed according to your code. The data I downloaded is the temperature data of March 5, 2020, with pressure levels ranging from 10 to 1000. Then I upload the data to Google Cloud disk, may I ask you how I can use my data as input data in the demo code of Google colab to get the prediction result? Thank you very much.@abhinavyesss

You have quoted the solution to this in your most recent comment.

However this is just the input data. In order to make the predictions you will also have to get the forcings and target data. For every hour you will be making the prediction for, you need to have the forcings and target values.

Hi, I can understand that forcing values are needed for making predictions. But what is the purpose of target values? are they needed for accuracy evaluation only or make predictions itself. Thanks in advance. @abhinavyesss

Jan 07 '24 18:01 hydroinfo-prasad

graphcast graphcast copied to clipboard

How to use this code?

graphcast
graphcast copied to clipboard