MeteoInfo icon indicating copy to clipboard operation
MeteoInfo copied to clipboard

Python script for converting grib files to ARL

Open Msturroc opened this issue 1 year ago • 1 comments

Hi Yaqiang, excellent software! Tried posting this on gitter but I don't think you or anyone checks that anymore. I (like others before) have come here due to your python script for converting grib files to ARL for use with hysplit. I seem to be able to run the script fine (after changing some variable names to be associated with my particular ERA5 grib files) but when I try to view the outputted ARL Files in meteoinfomap it hits me with the error: java.lang.NumberFormatException: For input string: "���"

If you have time to look, my code is

# Convert ERA5 GRIB data to ARL data

#---- Set data folder
datadir = r'/home/marc/Downloads/hysplit.v5.3.0_UbuntuOS20.04.6LTS/examples/'
#---- Set output data file
outfn = os.path.join(datadir, 'test_era5_grib_diff.arl')
#---- Read a GRIB data file
infn3d = addfile('/home/marc/Downloads/hysplit.v5.3.0_UbuntuOS20.04.6LTS/examples/ERA5_1965.Jul15.3dpl.grib'.format(datadir))
infn2d = addfile('/home/marc/Downloads/hysplit.v5.3.0_UbuntuOS20.04.6LTS/examples/ERA5_1965.Jul15.2dpl.all.grib'.format(datadir))

print 'GRIB data file has been opened...'
#---- Set output ARL data file
arlf = addfile(outfn, 'c', dtype='arl')

#---- Set variable and level list
#---- Variable names in ERA5 data file
gvar2d = ['Geopotential_surface','Surface_pressure_surface','Surface_latent_heat_flux_surface_1_Hour_Accumulation','2_metre_temperature_surface','10_metre_U_wind_component_surface',\
    '10_metre_V_wind_component_surface','Mean_sea_level_pressure_surface','Boundary_layer_height_surface','Convective_available_potential_energy_surface','Surface_solar_radiation_downwards_surface_1_Hour_Accumulation',\
    'Total_precipitation_surface_1_Hour_Accumulation','Instantaneous_surface_sensible_heat_flux_surface']

gvar3d = ['Geopotential_isobaric','Temperature_isobaric','Vertical_velocity_isobaric',\
    'U_component_of_wind_isobaric','V_component_of_wind_isobaric','Relative_humidity_isobaric']

#---- Corresponding variable names in ARL data file
avar2d = ['HGTS','PRSS', 'LTHF', 'T02M', 'U10M', 'V10M', 'MSLP', 'PBLH', 'CAPE', 'DSWF', 'TPP1', 'SHTF']
avar3d = ['HGTS','TEMP','WWND','UWND','VWND','RELH']
#--- Add DIFF fields - difference between the original data and the packed data
avar3d_diff = list(avar3d)
avar3d_diff.append('DIFW')

#---- Set parameters of ARL data file
gv = infn3d['Geopotential_isobaric']
nx = gv.dimlen(gv.ndim - 1)
ny = gv.dimlen(gv.ndim - 2)
levels = gv.dimvalue(gv.ndim - 3)[::-1]
nz = len(levels)
arlf.setlevels(levels)
arlf.set2dvar(avar2d)
for l in levels:
    arlf.set3dvar(avar3d_diff)
arlf.setx(gv.dimvalue(gv.ndim - 1))
arlf.sety(gv.dimvalue(gv.ndim - 2))

#---- Write ARL data file
tNum = infn3d.timenum()
fhour = 0
for t in range(0, tNum):
    print('Time index: ' + str(t))
    atime = infn3d.gettime(t)
    print(atime.strftime('%Y-%m-%d %H:00'))
    dhead = arlf.getdatahead(infn3d.proj, 'RSMC', 2, fhour)
    #Pre-write index record without checksum - will be over-write latter
    arlf.writeindexrec(atime, dhead)
    #Checksum list
    ksumlist = [[] for _ in range(nz + 1)]  # Initialize with correct structure
    # Write 2d variables
    for avname,gvname in zip(avar2d, gvar2d):
        gdata = infn2d[gvname][t,:,:]
        if avname == 'PRSS':
            gdata = gdata * 0.01
        ksum = arlf.writedatarec(atime, 0, avname, fhour, 99, gdata)
        ksumlist[0].append(ksum)
    # Write 3d variables
    for lidx in range(0, nz):
        llidx = nz - lidx - 1
        print(lidx,llidx)        
        for avname,gvname in zip(avar3d, gvar3d):
            gdata = infn3d[gvname][t,llidx,:,:]            
            if avname == 'WWND':
                gdata = gdata * 0.01
                difw = arlf.diff_origin_pack(gdata)
            elif avname == 'SPHU':
                gdata = gdata * 1000.
            elif avname == 'HGTS':
                gdata = gdata / 9.80665
            ksum = arlf.writedatarec(atime, lidx + 1, avname, fhour, 99, gdata)
            ksumlist[lidx + 1].append(ksum)
        ksum = arlf.writedatarec(atime, lidx + 1, 'DIFW', fhour, 99, difw)
        ksumlist[lidx + 1].append(ksum)
    #Re-write index record with checksum
    arlf.writeindexrec(atime, dhead, ksumlist)
    fhour += 1
arlf.close()
print('Finished!')

Would be interested to develop the python script further to generalise it to work with any particular ERA5 grib file. Any tips for getting this to work?

Best wishes, Marc

Msturroc avatar Aug 28 '24 09:08 Msturroc

Is it possible for you to provide an example ERA5 grib data file and the conversion script, so I can reproduce the problem and have chance to fix it.

Yaqiang avatar Aug 28 '24 14:08 Yaqiang

Thanks @Yaqiang! The files should be available here (let me know if you can access them): https://www.dropbox.com/scl/fo/hoeu9e758xf7c43c7qzpy/APYSc8Ef6Dt8-iPGJ3VSg9E?rlkey=7iggho1x8b76x8vd7x06uw63e&st=qkns1do8&dl=0

My code is below. It does convert but the resultant ARL file doesn't seem to be readable...


#---- Set data folder
datadir = '../hysplit.v5.3.0_UbuntuOS20.04.6LTS/examples/'
#---- Set output data file
outfn = os.path.join(datadir, 'test_era5_grib_diff.arl')
#---- Read a GRIB data file
infn3d = addfile('../hysplit.v5.3.0_UbuntuOS20.04.6LTS/examples/ERA5_1965.Jul15.3dpl.grib'.format(datadir))
infn2d = addfile('../hysplit.v5.3.0_UbuntuOS20.04.6LTS/examples/ERA5_1965.Jul15.2dpl.all.grib'.format(datadir))

print 'GRIB data file has been opened...'
#---- Set output ARL data file
arlf = addfile(outfn, 'c', dtype='arl')

#---- Set variable and level list
#---- Variable names in ERA5 data file
gvar2d = ['Geopotential_surface','Surface_pressure_surface','Surface_latent_heat_flux_surface_1_Hour_Accumulation','2_metre_temperature_surface','10_metre_U_wind_component_surface','10_metre_V_wind_component_surface','Mean_sea_level_pressure_surface','Boundary_layer_height_surface','Convective_available_potential_energy_surface','Surface_solar_radiation_downwards_surface_1_Hour_Accumulation','Total_precipitation_surface_1_Hour_Accumulation','Instantaneous_surface_sensible_heat_flux_surface']
#gvar2d = ['Surface_pressure_surface','2_metre_temperature_surface','10_metre_U_wind_component_surface',\
#    '10_metre_V_wind_component_surface','Boundary_layer_height_surface']
    
gvar3d = ['Geopotential_isobaric','Temperature_isobaric','U_component_of_wind_isobaric','V_component_of_wind_isobaric','Vertical_velocity_isobaric','Relative_humidity_isobaric']

#---- Corresponding variable names in ARL data file
avar2d = ['SHGT','PRSS', 'LTHF', 'T02M', 'U10M', 'V10M', 'MSLP', 'PBLH', 'CAPE', 'DSWF', 'TPP1', 'SHTF']
#avar2d = ['PRSS','T02M','U10M','V10M','PBLH']
avar3d = ['HGTS','TEMP','UWND','VWND','WWND','RELH']

#--- Add DIFF fields - difference between the original data and the packed data
avar3d_diff = list(avar3d)
avar3d_diff.append('DIFW')

#---- Set parameters of ARL data file
gv = infn3d['Geopotential_isobaric']
nx = gv.dimlen(gv.ndim - 1)
ny = gv.dimlen(gv.ndim - 2)
levels = gv.dimvalue(gv.ndim - 3)[::-1]
nz = len(levels)
arlf.setlevels(levels)
arlf.set2dvar(avar2d)
for l in levels:
    arlf.set3dvar(avar3d_diff)
arlf.setx(gv.dimvalue(gv.ndim - 1))
arlf.sety(gv.dimvalue(gv.ndim - 2))

#---- Write ARL data file
tNum = infn3d.timenum()
fhour = 0
for t in range(0, tNum):
    print('Time index: ' + str(t))
    atime = infn3d.gettime(t)
    print(atime.strftime('%Y-%m-%d %H:00'))
    dhead = arlf.getdatahead(infn3d.proj, 'RSMC', 2, fhour)
    #Pre-write index record without checksum - will be over-write latter
    arlf.writeindexrec(atime, dhead)
    #Checksum list
    ksumlist = [[] for _ in range(nz + 1)]  # Initialize with correct structure
    # Write 2d variables
    for avname,gvname in zip(avar2d, gvar2d):
        gdata = infn2d[gvname][t,:,:]
        if avname == 'PRSS':
            gdata = gdata * 0.01
        ksum = arlf.writedatarec(atime, 0, avname, fhour, 99, gdata)
        ksumlist[0].append(ksum)
    # Write 3d variables
    for lidx in range(0, nz):
        llidx = nz - lidx - 1
        print(lidx,llidx)        
        for avname,gvname in zip(avar3d, gvar3d):
            gdata = infn3d[gvname][t,llidx,:,:]            
            if avname == 'WWND':
                gdata = gdata * 0.01
                difw = arlf.diff_origin_pack(gdata)
            elif avname == 'SPHU':
                gdata = gdata * 1000.
            elif avname == 'HGTS':
                gdata = gdata / 9.80665
            ksum = arlf.writedatarec(atime, lidx + 1, avname, fhour, 99, gdata)
            ksumlist[lidx + 1].append(ksum)
        ksum = arlf.writedatarec(atime, lidx + 1, 'DIFW', fhour, 99, difw)
        ksumlist[lidx + 1].append(ksum)
    #Re-write index record with checksum
    arlf.writeindexrec(atime, dhead, ksumlist)
    fhour += 1
arlf.close()
print('Finished!')

Msturroc avatar Aug 29 '24 15:08 Msturroc

I have tried your scirpt dand data files. The ARL data can be converted and can be opened in MeteoInfoMap and MeteoInfoLab. I can not reproduce your problem in my side. You may try the newest version of MeteoInfo if the version you used is out of date. 1724994083269

Yaqiang avatar Aug 30 '24 05:08 Yaqiang

interesting! Yes, I am using MeteoInfoLab 3.8.11. Will try updating and report back! On 3.8.11 i'm getting the following error trying to add the generated file:

Traceback (most recent call last):
  File "/home/marc/Downloads/MeteoInfo/pylib/test_convert.py", line 88, in <module>
    f = addfile(fn)
  File "/home/marc/Downloads/MeteoInfo/pylib/mipylib/dataset/midata.py", line 120, in addfile
    meteodata.openData(fname, keepopen)
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.base/java.lang.Integer.parseInt(Integer.java:652)
	at java.base/java.lang.Integer.parseInt(Integer.java:770)
	at org.meteoinfo.data.meteodata.arl.ARLDataInfo.readDataHead(ARLDataInfo.java:594)
	at org.meteoinfo.data.meteodata.arl.ARLDataInfo.readDataInfo(ARLDataInfo.java:437)
	at org.meteoinfo.data.meteodata.MeteoDataInfo.openData(MeteoDataInfo.java:524)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
java.lang.NumberFormatException: java.lang.NumberFormatException: For input string: "???"

Msturroc avatar Aug 30 '24 11:08 Msturroc

hmm, even with 3.9.3, i'm getting the same error (though the outputted ARL file size is different - 432.1 mb compared to ~ 330mb before, so something has changed certainly).

Finished!
Traceback (most recent call last):
  File "/home/marc/Downloads/MeteoInfo/test_conversion.py", line 88, in <module>
    f = addfile(fn)
  File "/home/marc/Downloads/MeteoInfo/pylib/mipylib/dataset/midata.py", line 120, in addfile
    meteodata.openData(fname, keepopen)
	at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
	at java.base/java.lang.Integer.parseInt(Integer.java:652)
	at java.base/java.lang.Integer.parseInt(Integer.java:770)
	at org.meteoinfo.data.meteodata.arl.ARLDataInfo.readDataHead(ARLDataInfo.java:594)
	at org.meteoinfo.data.meteodata.arl.ARLDataInfo.readDataInfo(ARLDataInfo.java:437)
	at org.meteoinfo.data.meteodata.MeteoDataInfo.openData(MeteoDataInfo.java:524)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.base/java.lang.reflect.Method.invoke(Method.java:566)
java.lang.NumberFormatException: java.lang.NumberFormatException: For input string: "???"

I also tried loading the ARL file into hysplit, and got this error:

*ERROR* metset: 2nd time period INDX record missing

I want to add that other ARL files load fine using addfile (just not this one I have converted) and i'm using Linux.

Msturroc avatar Aug 30 '24 12:08 Msturroc

It's quite interesting. What is your computer system and language? I am thinking maybe the source is the comma decimal point in some locale language. If that's the case, you can try to change the locale language to English and run the script again.

Yaqiang avatar Aug 31 '24 01:08 Yaqiang

I am using Windows 11 computer. The data convert script and result ARL data file were loaded here: https://www.dropbox.com/scl/fi/r0qavj98blm7d9lmfcfk9/test_era5_grib_diff_1.arl?rlkey=30vxxcvxfydnfhj1jmivvvluj&st=shj4i63o&dl=0 , https://www.dropbox.com/scl/fi/9j6kd1vampis4fwqaiuto/grib2arl_test.py?rlkey=r629760wp0f62njlrbmgpp9xn&st=rqahr1sx&dl=0

Yaqiang avatar Aug 31 '24 01:08 Yaqiang

Here is some additional info about my computer system and language, I am using English.

image

Msturroc avatar Aug 31 '24 12:08 Msturroc

I will try using windows 11 also, thanks! I will report back

Msturroc avatar Aug 31 '24 12:08 Msturroc

I encountered issues on my Linux machine when running scripts that worked fine on my Windows 11 machine. Upon investigation, I found that the Linux machine was using an older Java version (OpenJDK 11.0.24) by checking the Java version with the java -version command. Realising the Java version was outdated, I updated to the latest Java version (Java 22.0.2) using SDKMAN!. After the update, the scripts now run as expected, and the discrepancies between the Linux and Windows environments have been resolved. Apologies for that and thanks again for your software/patience!

Msturroc avatar Sep 02 '24 12:09 Msturroc