wrf-python icon indicating copy to clipboard operation
wrf-python copied to clipboard

getvar() from many wrf output files?

Open Timothy-W-Hilton opened this issue 5 years ago • 2 comments

First and foremost, thanks for providing this fantastic tool.

I'm using wrf.getvar() to open a time series for several variables (HFX, LH, some of the diagnostic variables) that are stored in WRF-written netCDF files. Each file contains a single temporal value (for a 30-minute period). It's a 4-month WRF run, so there are many of these files (> 5000).

For shorter WRF runs with fewer files I've passed getvar() a list of netCDF4.Datset objects.

Now I'm hitting a limit for number of open files (OSError: [Errno 24] Too many open files).

Is there a "best practice" for reading a single variable from lots and lots of netCDF files? It seems that xarray (1) isn't yet supported for getvar() and (2) may not work well anyway because xarray.open_mfdataset seems to want to read every variable from each WRF file and is thus very slow.

My WRF files are netCDF4 (not netCDF4-classic) which seems to rule out netcdf4.MFDataset().

Is my best bet to use something like ncrcat to make a temporary netCDF file containing only the variable I want? This could work but would, I guess, require some digging to supply all the WRF output variables needed for some of the wrf.getvar() diagnostic variables.

Timothy-W-Hilton avatar Jun 14 '19 00:06 Timothy-W-Hilton

This is the code I use, I haven't use it with >5000 files, but I think it should work.

import glob
from netCDF4 import Dataset
from wrf import getvar, ALL_TIMES

list_of_paths = glob.glob(r'../wrf/wrfout_d0*') #list the files
list_of_paths.sort() #sort the files, I don't know why the glob function get them in any order
wrflist=[]
for i in range(0, len(list_of_paths)-1): 
    wrflist.append(Dataset(list_of_paths[i]))

HFX=getvar(wrflist, "HFX", timeidx=ALL_TIMES, method="join") #join the variable from all files

cross85 avatar Dec 20 '19 16:12 cross85

This is the code I use, I haven't use it with >5000 files, but I think it should work.

import glob
from netCDF4 import Dataset
from wrf import getvar, ALL_TIMES

list_of_paths = glob.glob(r'../wrf/wrfout_d0*') #list the files
list_of_paths.sort() #sort the files, I don't know why the glob function get them in any order
wrflist=[]
for i in range(0, len(list_of_paths)-1): 
    wrflist.append(Dataset(list_of_paths[i]))

HFX=getvar(wrflist, "HFX", timeidx=ALL_TIMES, method="join") #join the variable from all files

Thank you for sharing this code. Could you add, the code for writing the data(HFX) as a time series to a fresh netcdf file?

rajkumar8581 avatar Nov 26 '21 18:11 rajkumar8581