vcs icon indicating copy to clipboard operation
vcs copied to clipboard

vcs.download_sample_data_files() keeps on downloading th_yr.nc

Open jypeter opened this issue 6 years ago • 4 comments

@doutriaux1 I wonder if there is a problem with a) vcs.download_sample_data_files() or with b) th_yr.nc

When I execute a) in a CDAT version where it has already been executed, it apparently sees that the files are already here, except for b) that it downloads 3 times. Same thing if I re-execute a)

-rw-r--r-- 1 jypeter lsce   332776 Mar 12 14:44 th_yr.nc

(cdatm18_py2) jypeter@obelix2 - ...jypeter - 61 >python -c 'import vcs; vcs.download_sample_data_files(); print "\nFinished downloading sample data to", vcs.sample_data'
Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data/th_yr.nc
Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data/th_yr.nc
Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data/th_yr.nc

Finished downloading sample data to /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data

(cdatm18_py2) jypeter@obelix2 - ...jypeter - 62 >ls -ltr /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data | tail
-rw-r--r-- 1 jypeter lsce     3216 Mar  7 16:27 tas_gavg_rnl_ecm.nc
-rw-r--r-- 1 jypeter lsce   510144 Mar  7 16:27 tas_ecm_1979.nc
-rw-r--r-- 1 jypeter lsce   128360 Mar  7 16:27 tas_cru_1979.nc
-rw-r--r-- 1 jypeter lsce  2107996 Mar  7 16:27 psl_6h.nc
-rw-r--r-- 1 jypeter lsce   366116 Mar  7 16:27 ts_da.nc
-rw-r--r-- 1 jypeter lsce  2678584 Mar  7 16:27 tas_mo.nc
-rw-r--r-- 1 jypeter lsce   159468 Mar  7 16:27 tas_mo_clim.nc
-rw-r--r-- 1 jypeter lsce  6280312 Mar  7 16:27 tas_6h.nc
-rw-r--r-- 1 jypeter lsce 34487602 Mar  7 16:27 geos5-sample.nc
-rw-r--r-- 1 jypeter lsce   332776 Mar 12 14:46 th_yr.nc

(cdatm18_py2) jypeter@obelix2 - ...jypeter - 63 >python -c 'import vcs; vcs.download_sample_data_files(); print "\nFinished downloading sample data to", vcs.sample_data'
Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data/th_yr.nc
Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data/th_yr.nc
Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data/th_yr.nc

Finished downloading sample data to /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data

(cdatm18_py2) jypeter@obelix2 - ...jypeter - 64 >ls -ltr /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data | tail
-rw-r--r-- 1 jypeter lsce     3216 Mar  7 16:27 tas_gavg_rnl_ecm.nc
-rw-r--r-- 1 jypeter lsce   510144 Mar  7 16:27 tas_ecm_1979.nc
-rw-r--r-- 1 jypeter lsce   128360 Mar  7 16:27 tas_cru_1979.nc
-rw-r--r-- 1 jypeter lsce  2107996 Mar  7 16:27 psl_6h.nc
-rw-r--r-- 1 jypeter lsce   366116 Mar  7 16:27 ts_da.nc
-rw-r--r-- 1 jypeter lsce  2678584 Mar  7 16:27 tas_mo.nc
-rw-r--r-- 1 jypeter lsce   159468 Mar  7 16:27 tas_mo_clim.nc
-rw-r--r-- 1 jypeter lsce  6280312 Mar  7 16:27 tas_6h.nc
-rw-r--r-- 1 jypeter lsce 34487602 Mar  7 16:27 geos5-sample.nc
-rw-r--r-- 1 jypeter lsce   332776 Mar 12 14:56 th_yr.nc

jypeter avatar Mar 12 '19 13:03 jypeter

You're right looks like these files are either corrupted or missing, I'll take a look. Thanks for reporting.

From: Jean-Yves Peterschmitt [email protected] Reply-To: CDAT/vcs [email protected] Date: Tuesday, March 12, 2019 at 6:59 AM To: CDAT/vcs [email protected] Cc: "Doutriaux, Charles" [email protected], Mention [email protected] Subject: [CDAT/vcs] vcs.download_sample_data_files() keeps on downloading th_yr.nc (#392)

@doutriaux1https://github.com/doutriaux1 I wonder if there is a problem with a) vcs.download_sample_data_files() or with b) th_yr.nc

When I execute a) in a CDAT version where it has already been executed, it apparently sees that the files are already here, except for b) that it downloads 3 times. Same thing if I re-execute a)

-rw-r--r-- 1 jypeter lsce 332776 Mar 12 14:44 th_yr.nc

(cdatm18_py2) jypeter@obelix2 - ...jypeter - 61 >python -c 'import vcs; vcs.download_sample_data_files(); print "\nFinished downloading sample data to", vcs.sample_data'

Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data/th_yr.nc

Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data/th_yr.nc

Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data/th_yr.nc

Finished downloading sample data to /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data

(cdatm18_py2) jypeter@obelix2 - ...jypeter - 62 >ls -ltr /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data | tail

-rw-r--r-- 1 jypeter lsce 3216 Mar 7 16:27 tas_gavg_rnl_ecm.nc

-rw-r--r-- 1 jypeter lsce 510144 Mar 7 16:27 tas_ecm_1979.nc

-rw-r--r-- 1 jypeter lsce 128360 Mar 7 16:27 tas_cru_1979.nc

-rw-r--r-- 1 jypeter lsce 2107996 Mar 7 16:27 psl_6h.nc

-rw-r--r-- 1 jypeter lsce 366116 Mar 7 16:27 ts_da.nc

-rw-r--r-- 1 jypeter lsce 2678584 Mar 7 16:27 tas_mo.nc

-rw-r--r-- 1 jypeter lsce 159468 Mar 7 16:27 tas_mo_clim.nc

-rw-r--r-- 1 jypeter lsce 6280312 Mar 7 16:27 tas_6h.nc

-rw-r--r-- 1 jypeter lsce 34487602 Mar 7 16:27 geos5-sample.nc

-rw-r--r-- 1 jypeter lsce 332776 Mar 12 14:46 th_yr.nc

(cdatm18_py2) jypeter@obelix2 - ...jypeter - 63 >python -c 'import vcs; vcs.download_sample_data_files(); print "\nFinished downloading sample data to", vcs.sample_data'

Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data/th_yr.nc

Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data/th_yr.nc

Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data/th_yr.nc

Finished downloading sample data to /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data

(cdatm18_py2) jypeter@obelix2 - ...jypeter - 64 >ls -ltr /home/share/unix_files/cdat/miniconda3/envs/cdatm18_py2/share/cdat/sample_data | tail

-rw-r--r-- 1 jypeter lsce 3216 Mar 7 16:27 tas_gavg_rnl_ecm.nc

-rw-r--r-- 1 jypeter lsce 510144 Mar 7 16:27 tas_ecm_1979.nc

-rw-r--r-- 1 jypeter lsce 128360 Mar 7 16:27 tas_cru_1979.nc

-rw-r--r-- 1 jypeter lsce 2107996 Mar 7 16:27 psl_6h.nc

-rw-r--r-- 1 jypeter lsce 366116 Mar 7 16:27 ts_da.nc

-rw-r--r-- 1 jypeter lsce 2678584 Mar 7 16:27 tas_mo.nc

-rw-r--r-- 1 jypeter lsce 159468 Mar 7 16:27 tas_mo_clim.nc

-rw-r--r-- 1 jypeter lsce 6280312 Mar 7 16:27 tas_6h.nc

-rw-r--r-- 1 jypeter lsce 34487602 Mar 7 16:27 geos5-sample.nc

-rw-r--r-- 1 jypeter lsce 332776 Mar 12 14:56 th_yr.nc

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHubhttps://github.com/CDAT/vcs/issues/392, or mute the threadhttps://github.com/notifications/unsubscribe-auth/ACpw8XIOui6ZfwT9ZhMW2dZ8ydP2LPxwks5vV7KkgaJpZM4bq99j.

doutriaux1 avatar Mar 12 '19 16:03 doutriaux1

The file I got seems valid enough. You can compare my md5sum below with the one of the source file

jypeter@obelix4 - ...sample_data - 54 >ls -l th_yr.nc
-rw-r--r-- 1 jypeter lsce 332776 Mar 12 14:56 th_yr.nc

jypeter@obelix4 - ...sample_data - 55 >md5sum th_yr.nc
00f26c388be3a13fecc1d7583e234353  th_yr.nc

jypeter@obelix4 - ...sample_data - 56 >ncdump th_yr.nc | head -30
netcdf th_yr {
dimensions:
        Time_th = UNLIMITED ; // (10 currently)
        Latitude = 64 ;
        bound = 2 ;
        Longitude_th = 128 ;
variables:
        float Time_th(Time_th) ;
                Time_th:units = "years since 1" ;
                Time_th:calendar = "proleptic_gregorian" ;
                Time_th:axis = "T" ;
        float Latitude(Latitude) ;
                Latitude:bounds = "bounds_Latitude" ;
                Latitude:units = "Degrees" ;
                Latitude:title = "" ;
                Latitude:time = "11:47:39" ;
                Latitude:source = "" ;
                Latitude:date = "25/04/02" ;
                Latitude:axis = "Y" ;
        double bounds_Latitude(Latitude, bound) ;
        float Longitude_th(Longitude_th) ;
                Longitude_th:bounds = "bounds_Longitude_th" ;
                Longitude_th:axis = "X" ;
                Longitude_th:units = "degrees_east" ;
                Longitude_th:modulo = 360. ;
                Longitude_th:topology = "circular" ;
        double bounds_Longitude_th(Longitude_th, bound) ;
        float th(Time_th, Latitude, Longitude_th) ;
                th:missing_value = 1.e+20f ;
                th:date = "25/04/02" ;

jypeter@obelix4 - ...sample_data - 57 >ncdump th_yr.nc | tail -30
    249.6738, 249.7106, 249.7536, 249.8022, 249.8556, 249.9131, 249.9741,
    250.0379, 250.1039, 250.1716, 250.2402, 250.3094, 250.3787, 250.4476,
    250.5156, 250.5823, 250.6474, 250.7104, 250.771, 250.8291, 250.8843,
    250.9366, 250.9859, 251.0325, 251.0763, 251.1176, 251.1569, 251.1942,
    251.23, 251.2647, 251.2986, 251.3321, 251.3653, 251.3986, 251.4322,
    251.4664, 251.5014, 251.5375, 251.5748, 251.6137, 251.6545, 251.6972,
    251.7423, 251.7899, 251.8402, 251.8932, 251.949, 252.0076, 252.0686,
    252.1319, 252.1971, 252.2638, 252.3314, 252.3996, 252.4674, 252.5345,
    252.6001, 252.6638, 252.725, 252.7832, 252.838, 252.8891, 252.9364,
    252.9797, 253.019,
  252.6102, 252.6246, 252.6373, 252.6483, 252.6574, 252.6647, 252.6702,
    252.6738, 252.6755, 252.6754, 252.6733, 252.6694, 252.6637, 252.6559,
    252.6463, 252.6347, 252.6212, 252.6057, 252.5882, 252.5687, 252.5471,
    252.5236, 252.4979, 252.4702, 252.4405, 252.4087, 252.3749, 252.3391,
    252.3015, 252.2621, 252.2209, 252.1782, 252.134, 252.0885, 252.0419,
    251.9945, 251.9464, 251.8978, 251.8491, 251.8004, 251.7521, 251.7045,
    251.6576, 251.6119, 251.5676, 251.5249, 251.4841, 251.4452, 251.4086,
    251.3745, 251.3445, 251.3172, 251.2925, 251.2707, 251.2515, 251.2354,
    251.2219, 251.2113, 251.2034, 251.1982, 251.1957, 251.1955, 251.1978,
    251.2023, 251.209, 251.2176, 251.228, 251.2401, 251.2538, 251.2688,
    251.2851, 251.3024, 251.3206, 251.3396, 251.3592, 251.3794, 251.4,
    251.4209, 251.442, 251.4633, 251.4846, 251.5059, 251.5271, 251.5483,
    251.5694, 251.5903, 251.6112, 251.6319, 251.6526, 251.6733, 251.694,
    251.7147, 251.7355, 251.7564, 251.7776, 251.799, 251.8208, 251.8428,
    251.8654, 251.8883, 251.9118, 251.9357, 251.9603, 251.9853, 252.0109,
    252.0369, 252.0636, 252.0906, 252.1181, 252.1458, 252.1738, 252.202,
    252.2303, 252.2586, 252.2866, 252.3145, 252.3419, 252.3688, 252.3951,
    252.4206, 252.4453, 252.4689, 252.4914, 252.5138, 252.5361, 252.5569,
    252.5763, 252.5941 ;
}

jypeter avatar Mar 14 '19 13:03 jypeter

@jypeter it's possible the md5 is wrong in our check list, I'll double check

doutriaux1 avatar Mar 14 '19 13:03 doutriaux1

@downiec @jasonb5 I have just noticed the problem again and almost created a new issue !

I'm using the latest stable vcs (and cdms)

 >conda list | egrep '(vcs|cdms)'
cdms2                     3.1.5                    pypi_0    pypi
libcdms                   3.1.2              h981a4fd_113    conda-forge
vcs                       8.2.1              pyh9f0ad1d_0    cdat/label/v8.2.1
vcsaddons                 8.2.1            py38h1e0a361_0    cdat/label/v8.2.1

I have downloaded the full sample data to a new installation

 >ls -ltr /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm19_nompi_py3/share/cdat/sample_data | tail -3
-rw-r--r-- 1 jypeter lsce  6280312 May 17 16:39 tas_6h.nc
-rw-r--r-- 1 jypeter lsce 34487602 May 17 16:39 geos5-sample.nc
-rw-r--r-- 1 jypeter lsce   332776 May 17 16:49 th_yr.nc

 >md5sum /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm19_nompi_py3/share/cdat/sample_data/th_yr.nc
00f26c388be3a13fecc1d7583e234353  /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm19_nompi_py3/share/cdat/sample_data/th_yr.nc

But each time I execute vcs.download_sample_data_files(), it downloads again th_yr.nc, three times!

>>> vcs.download_sample_data_files()
Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm19_nompi_py3/share/cdat/sample_data/th_yr.nc
Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm19_nompi_py3/share/cdat/sample_data/th_yr.nc
Downloading: 'th_yr.nc' from 'https://cdat.llnl.gov/cdat/sample_data/' in: /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm19_nompi_py3/share/cdat/sample_data/th_yr.nc

The file has not changed (according to md5sum), or course

 >ls -ltr /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm19_nompi_py3/share/cdat/sample_data | tail -3
-rw-r--r-- 1 jypeter lsce  6280312 May 17 16:39 tas_6h.nc
-rw-r--r-- 1 jypeter lsce 34487602 May 17 16:39 geos5-sample.nc
-rw-r--r-- 1 jypeter lsce   332776 May 17 17:06 th_yr.nc

 >md5sum /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm19_nompi_py3/share/cdat/sample_data/th_yr.nc
00f26c388be3a13fecc1d7583e234353  /home/share/unix_files/cdat/miniconda3_21-02/envs/cdatm19_nompi_py3/share/cdat/sample_data/th_yr.nc

Well, you can even see this in the output of the tutorials !! e.g. https://cdat.llnl.gov/Jupyter-notebooks/vcs/VCS_Example/VCS_Example.html

I'm afraid that, if people execute the notebooks on a server where CDAT and the data files were installed by somebody else, the notebook may fail when the download tries to write the data file in somebody else's directory where they don't have write access

jypeter avatar May 17 '21 15:05 jypeter