SubX icon indicating copy to clipboard operation
SubX copied to clipboard

Python SubX download_data produces an error before scripts can be created

Open kdl0013 opened this issue 3 years ago • 3 comments

In file, SubX/Python/download_data/generate_full_py_ens_files.ksh the code will produce a list of all files, but it immediately fails at fen='python tmp.py' and produces the following error:

RuntimeError: NetCDF: Access failure oc_open: server error retrieving url: code=6 message="request too large"

The link provided https://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.RSMAS/.CCSM4/.hindcast/.tas/dods does not appear to have any files to open which may be the problem. IRIDL may have updated the site to not allow dods information to populate through.

kdl0013 avatar Dec 06 '21 16:12 kdl0013

I too face the same issue "oc_open: server error retrieving url: code=6 message="request too large"". I added 'decode_times=False', but the error still persists.

raghu330 avatar Feb 14 '22 15:02 raghu330

I too face the same issue "oc_open: server error retrieving url: code=6 message="request too large"". I added 'decode_times=False', but the error still persists.

@raghu330 You can use this python script I created to download SubX data. This script will create a wget list and you can run that script to download the files.

#!/usr/bin/env python3

'''Create wget script for each SubX model to setup download of files in parallel 

Inputs: IRIDL NCAR username and password
Inputs -- you can change the model and variables that you want downloaded

Outputs: Shell script which contains download info. Files will download 20 at a time when run.

'''
import os
import datetime as dt
import numpy as np


username_IRIDL = "usrname"
password_IRIDL = "psswd"

models = ['GMAO']
sources = ['GEOS_V2p1']
#vars = ['huss', 'dswrf','mrso','tas', 'uas', 'vas','tdps','pr','cape']
vars = ['tasmax','tasmin']

# get the dates for all SubX models (this is all the possible dates) I started at year 2000 because other SubX models have all started by then
start_date = dt.date(2000, 1, 5)
end_date = dt.date(2015, 12, 26)

dates = [start_date + dt.timedelta(days=d) for d in range(0, end_date.toordinal() - start_date.toordinal() + 1)]
#GMAO GEOS specifically skips leap days
dates_out = []
for i in dates:
    if i.month == 2 and i.day == 29:
        pass
    else:
        dates_out.append(i)

#Only every 5th date
dates=dates_out[::5]

### GMAO GEOS_V2p1 model


#vars=['tas']


new_dir=(f'/glade/scratch/klesinger/SubX/{models[0]}')
os.system(f'mkdir {new_dir}')

count=0    
output=[]
output.append('#!/bin/bash')
for m_i, model in enumerate(models):
    
    for d_i, date in enumerate(dates):
        
        for v_i, var in enumerate(vars):
        
            date_str = '{}-{}-{}'.format(str(date.year), str(date.month).rjust(2,'0'), str(date.day).rjust(2,'0'))

            command = f"wget -nc --user {username_IRIDL} --password {password_IRIDL} 'http://iridl.ldeo.columbia.edu/SOURCES/.Models/.SubX/.{model}/.{sources[m_i]}/.hindcast/.{var}/S/(1200%20{str(date.day)}%20{date.strftime('%b')}%20{str(date.year)})/VALUES/data.nc' {new_dir}/{var}_{model}_{date_str}.nc &"
            
            if count % 20 == 0:
                output.append('wait')
            
            count+=1
            output.append(command)
            
np.savetxt('wget_GMAO.txt',output, fmt="%s")

os.system("cat wget_GMAO.txt > wget_GMAO.sh")
os.system("rm wget_GMAO.txt")

After running the script, use command line to bash wget_GMAO.sh

kdl0013 avatar Feb 16 '22 11:02 kdl0013

Hello @kdl0013 Thanks very much for the response and for an excellent wget solution. My apologies for the delay in responding. I was able to help the person who came up with his issue to me using a similar but convoluted approach, but yours is better optimized. Since then, I am yet to hear further from him again, so my involvement has reverted back to my official duties. Thank you very much! Cheers Raghu

raghu330 avatar Feb 25 '22 19:02 raghu330