ufs-weather-model icon indicating copy to clipboard operation
ufs-weather-model copied to clipboard

Accessing UFS input data through web

Open uturuncoglu opened this issue 3 years ago • 11 comments

I just wonder if the UFS C96 input data (already staged in the supported platforms) accessible through the web (ftp, http or ant cloud side). Here is the list of files that I need to access,

total 86228
drwxr-xr-x 2 root root     4096 Jul  1 12:19 .
drwxr-xr-x 5 root root    20480 Jul 21 14:06 ..
-rw-r--r-- 1 root root   119490 Jul  1 12:19 C96.maximum_snow_albedo.tile1.nc
-rw-r--r-- 1 root root   119490 Jul  1 12:19 C96.maximum_snow_albedo.tile2.nc
-rw-r--r-- 1 root root   119490 Jul  1 12:19 C96.maximum_snow_albedo.tile3.nc
-rw-r--r-- 1 root root   119490 Jul  1 12:18 C96.maximum_snow_albedo.tile4.nc
-rw-r--r-- 1 root root   119490 Jul  1 12:19 C96.maximum_snow_albedo.tile5.nc
-rw-r--r-- 1 root root   119490 Jul  1 12:19 C96.maximum_snow_albedo.tile6.nc
-rw-r--r-- 1 root root   119550 Jul  1 12:19 C96.slope_type.tile1.nc
-rw-r--r-- 1 root root   119550 Jul  1 12:19 C96.slope_type.tile2.nc
-rw-r--r-- 1 root root   119550 Jul  1 12:19 C96.slope_type.tile3.nc
-rw-r--r-- 1 root root   119550 Jul  1 12:19 C96.slope_type.tile4.nc
-rw-r--r-- 1 root root   119550 Jul  1 12:18 C96.slope_type.tile5.nc
-rw-r--r-- 1 root root   119550 Jul  1 12:19 C96.slope_type.tile6.nc
-rw-r--r-- 1 root root   119480 Jul  1 12:18 C96.soil_type.tile1.nc
-rw-r--r-- 1 root root   119480 Jul  1 12:19 C96.soil_type.tile2.nc
-rw-r--r-- 1 root root   119480 Jul  1 12:18 C96.soil_type.tile3.nc
-rw-r--r-- 1 root root   119480 Jul  1 12:19 C96.soil_type.tile4.nc
-rw-r--r-- 1 root root   119480 Jul  1 12:18 C96.soil_type.tile5.nc
-rw-r--r-- 1 root root   119480 Jul  1 12:19 C96.soil_type.tile6.nc
-rw-r--r-- 1 root root   119595 Jul  1 12:19 C96.substrate_temperature.tile1.nc
-rw-r--r-- 1 root root   119595 Jul  1 12:19 C96.substrate_temperature.tile2.nc
-rw-r--r-- 1 root root   119595 Jul  1 12:19 C96.substrate_temperature.tile3.nc
-rw-r--r-- 1 root root   119595 Jul  1 12:19 C96.substrate_temperature.tile4.nc
-rw-r--r-- 1 root root   119595 Jul  1 12:18 C96.substrate_temperature.tile5.nc
-rw-r--r-- 1 root root   119595 Jul  1 12:18 C96.substrate_temperature.tile6.nc
-rw-r--r-- 1 root root   525039 Jul  1 12:19 C96.vegetation_greenness.tile1.nc
-rw-r--r-- 1 root root   525039 Jul  1 12:19 C96.vegetation_greenness.tile2.nc
-rw-r--r-- 1 root root   525039 Jul  1 12:18 C96.vegetation_greenness.tile3.nc
-rw-r--r-- 1 root root   525039 Jul  1 12:19 C96.vegetation_greenness.tile4.nc
-rw-r--r-- 1 root root   525039 Jul  1 12:18 C96.vegetation_greenness.tile5.nc
-rw-r--r-- 1 root root   525039 Jul  1 12:19 C96.vegetation_greenness.tile6.nc
-rw-r--r-- 1 root root   119589 Jul  1 12:18 C96.vegetation_type.tile1.nc
-rw-r--r-- 1 root root   119589 Jul  1 12:19 C96.vegetation_type.tile2.nc
-rw-r--r-- 1 root root   119589 Jul  1 12:19 C96.vegetation_type.tile3.nc
-rw-r--r-- 1 root root   119589 Jul  1 12:19 C96.vegetation_type.tile4.nc
-rw-r--r-- 1 root root   119589 Jul  1 12:18 C96.vegetation_type.tile5.nc
-rw-r--r-- 1 root root   119589 Jul  1 12:19 C96.vegetation_type.tile6.nc
-rw-r--r-- 1 root root  2094935 Jul  1 12:19 C96_grid.tile1.nc
-rw-r--r-- 1 root root  2094935 Jul  1 12:19 C96_grid.tile2.nc
-rw-r--r-- 1 root root  2094935 Jul  1 12:18 C96_grid.tile3.nc
-rw-r--r-- 1 root root  2094935 Jul  1 12:19 C96_grid.tile4.nc
-rw-r--r-- 1 root root  2094935 Jul  1 12:18 C96_grid.tile5.nc
-rw-r--r-- 1 root root  2094935 Jul  1 12:18 C96_grid.tile6.nc
-rw-r--r-- 1 root root    21966 Jul  1 12:19 C96_mosaic.nc
-rw-r--r-- 1 root root   834883 Jul  1 12:19 oro_data.tile1.nc
-rw-r--r-- 1 root root   834883 Jul  1 12:19 oro_data.tile2.nc
-rw-r--r-- 1 root root   834883 Jul  1 12:18 oro_data.tile3.nc
-rw-r--r-- 1 root root   834883 Jul  1 12:19 oro_data.tile4.nc
-rw-r--r-- 1 root root   834883 Jul  1 12:19 oro_data.tile5.nc
-rw-r--r-- 1 root root   834883 Jul  1 12:18 oro_data.tile6.nc

uturuncoglu avatar Jul 21 '22 20:07 uturuncoglu

@uturuncoglu epic maintains the input files on s3 bucket. I will follow up with more information.

jkbk2004 avatar Jul 21 '22 20:07 jkbk2004

I have just found this https://aws.amazon.com/marketplace/pp/prodview-az2fsalpokx7k#resources

and get list of the fils with following command,

aws s3 ls --no-sign-request s3://noaa-ufs-regtests-pds/

i am not sure it is up-to-date with the files found in the platforms. Any idea?

uturuncoglu avatar Jul 21 '22 20:07 uturuncoglu

Yes, can you check? https://noaa-ufs-regtests-pds.s3.amazonaws.com/index.html

jkbk2004 avatar Jul 21 '22 20:07 jkbk2004

Yes, can you check? https://noaa-ufs-regtests-pds.s3.amazonaws.com/index.html

This should be up-to-date.

jkbk2004 avatar Jul 21 '22 20:07 jkbk2004

@jkbk2004 That is perfect! I'll try to get the required data and test it under my case. Thanks for your help.

uturuncoglu avatar Jul 21 '22 20:07 uturuncoglu

@jkbk2004 I could able to download the data from S3 without any issue but it turns out that the vegetation category data are inconsistent with the land mask found in the orography data. There are vegetation types that are -2147483647 for land points so that's what is causing the simulation to fail. Is it possible to fix the orography file on S3 bucket? BTW, here is the final list of files that I am using. There might be another folder that has already fixed files.

        - input-data-20220414/FV3_fix_tiled/C96/C96.maximum_snow_albedo.tile1.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.maximum_snow_albedo.tile2.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.maximum_snow_albedo.tile3.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.maximum_snow_albedo.tile4.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.maximum_snow_albedo.tile5.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.maximum_snow_albedo.tile6.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.slope_type.tile1.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.slope_type.tile2.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.slope_type.tile3.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.slope_type.tile4.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.slope_type.tile5.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.slope_type.tile6.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.soil_type.tile1.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.soil_type.tile2.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.soil_type.tile3.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.soil_type.tile4.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.soil_type.tile5.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.soil_type.tile6.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.substrate_temperature.tile1.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.substrate_temperature.tile2.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.substrate_temperature.tile3.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.substrate_temperature.tile4.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.substrate_temperature.tile5.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.substrate_temperature.tile6.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.vegetation_greenness.tile1.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.vegetation_greenness.tile2.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.vegetation_greenness.tile3.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.vegetation_greenness.tile4.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.vegetation_greenness.tile5.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.vegetation_greenness.tile6.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.vegetation_type.tile1.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.vegetation_type.tile2.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.vegetation_type.tile3.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.vegetation_type.tile4.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.vegetation_type.tile5.nc
        - input-data-20220414/FV3_fix_tiled/C96/C96.vegetation_type.tile6.nc
        - input-data-20220414/FV3_input_data/INPUT/C96_grid.tile1.nc
        - input-data-20220414/FV3_input_data/INPUT/C96_grid.tile2.nc
        - input-data-20220414/FV3_input_data/INPUT/C96_grid.tile3.nc
        - input-data-20220414/FV3_input_data/INPUT/C96_grid.tile4.nc
        - input-data-20220414/FV3_input_data/INPUT/C96_grid.tile5.nc
        - input-data-20220414/FV3_input_data/INPUT/C96_grid.tile6.nc
        - input-data-20220414/FV3_input_data/INPUT/grid_spec.nc
        - input-data-20220414/FV3_input_data/INPUT/oro_data.tile1.nc
        - input-data-20220414/FV3_input_data/INPUT/oro_data.tile2.nc
        - input-data-20220414/FV3_input_data/INPUT/oro_data.tile3.nc
        - input-data-20220414/FV3_input_data/INPUT/oro_data.tile4.nc
        - input-data-20220414/FV3_input_data/INPUT/oro_data.tile5.nc
        - input-data-20220414/FV3_input_data/INPUT/oro_data.tile6.nc 

uturuncoglu avatar Jul 25 '22 18:07 uturuncoglu

@uturuncoglu let me check if we need to sync the bucket again. It could be a pr or two missed on the bucket.

jkbk2004 avatar Jul 25 '22 18:07 jkbk2004

@uturuncoglu we just synced input-data-20220414. can you check again?

jkbk2004 avatar Jul 25 '22 19:07 jkbk2004

@jkbk2004 sure. let me try. thanks again for your help. do I need to wait until s3 bucket updated?

uturuncoglu avatar Jul 25 '22 20:07 uturuncoglu

@uturuncoglu the bucket must be up-to-date now.

jkbk2004 avatar Jul 25 '22 20:07 jkbk2004

@jkbk2004 Thanks for your help. I did not try to run the model but the checksums of new files are same with the old ones downloaded from the S3. Do we know which files are changed in the S3 bucket?

uturuncoglu avatar Jul 25 '22 20:07 uturuncoglu

@jkbk2004 I have just checked and s3 bucket is out-of-sync again (I think). At this point I could not find FV3_fix_tiled and FV3_input_data folders under input-data-20221101.

https://noaa-ufs-regtests-pds.s3.amazonaws.com/index.html#input-data-20221101/

The component testing GitHub action depends on those files. At this point, I am using cached files but once the cache is gone the action will start to fail. Any suggestions?

uturuncoglu avatar Feb 23 '23 21:02 uturuncoglu

@jkbk2004 Also, I could not find those directories under input-data-20220414 too.

uturuncoglu avatar Feb 23 '23 22:02 uturuncoglu

@uturuncoglu sure! I will check with data manager.

jkbk2004 avatar Feb 23 '23 22:02 jkbk2004

@jkbk2004 Thanks. Sorry for extra work.

uturuncoglu avatar Feb 23 '23 22:02 uturuncoglu

Hey @uturuncoglu FYI:

  • Automation is setup for data maintenance and since input-data-20220414 falls outside the latest 2-months of developing code it will be cleaned out as the data maintenance requirement is to remove any input and baseline data not supporting the latest the 2-months of developing code in cloud. So, other than the 'INPUTDATA_ROOT_WW3': ['20220624'] which falls within input-data-20220414 and was just merged to the latest 2-months of dev code, it is expected to that those would not be in cloud per data requirement of only keeping data supporting/merged the latest 2-months of development code.

CSyl avatar Feb 28 '23 15:02 CSyl

@uturuncoglu

  • As for #input-data-20221101/, lengthy IT maintenance on RDHPCS disks can disrupt and delay migration for when it does not allow connection to its server since on-prem is the source. IT maintenance occurred last week for a few days. As a result: After IT completed their RDHPCS disk maintenance, automation was able to connect to the server and auto-migrate the latest datasets into cloud (yesterday), but I will go ahead and re-sync them in case if any of the file permissions for the latest data was altered.

CSyl avatar Feb 28 '23 15:02 CSyl

@CSyl May I ask if you have any instructions on how to download the data to a platform? Thanks!

junwang-noaa avatar Feb 28 '23 15:02 junwang-noaa

@junwang-noaa Hi Jun, I do not have a detailed procedure in documentation for downloading data , but there are a few ways of going about downloading the data to the platform via:

1) If using bash shell, then the simplest way would be to utilize the AWS CLI which does not require cloud credentials for reading. Built on Boto.

2) If you're developing a customizable framework for integration, then you could utilize boto3 which is a AWS SDK for python. This allows for more flexibility & capabilities in communicating with the cloud buckets. Built on Boto.

3) If using bash shell, then you could use wget if desired. For example, you want to get let's say the cloud object via wget:

  • s3://develop-20230126/INTEL/atmaero_control_p8/sfcf024.nc

Then, use

  • wget http://noaa-ufs-regtests-pds.s3.amazonaws.com/develop-20230126/INTEL/atmaero_control_p8/sfcf024.nc

Format should be:

  • http://{bucket-name}.s3.amazonaws.com/{path-to-file}

Result image

CSyl avatar Feb 28 '23 16:02 CSyl

Great, thanks @CSyl

junwang-noaa avatar Feb 28 '23 20:02 junwang-noaa

@CSyl Sorry for late response. I had six all last week. I have just check the https://noaa-ufs-regtests-pds.s3.amazonaws.com end and I still could not see the FV3_fix_tiled/C96 directory and the automated testing is failing. Is it possible also add that directory to regular syncing script etc. if there is. The NoahMP testing depends on that folder. Thanks for your help.

uturuncoglu avatar Mar 06 '23 04:03 uturuncoglu

@uturuncoglu https://noaa-ufs-regtests-pds.s3.amazonaws.com/index.html#input-data-20221101/FV3_fix_tiled/C96/ is there. If all ok, can I close this issue?

jkbk2004 avatar Mar 14 '23 00:03 jkbk2004

@jkbk2004 or others, do you know if it's possible to say, download all of: https://noaa-ufs-regtests-pds.s3.amazonaws.com/index.html#input-data-20221101/ without specifying the path of each file in each subfolder? I've been looking online with wget commands and have had no luck so far, so thought I'd ask here to see if anyone has had success.

JessicaMeixner-NOAA avatar Mar 17 '23 11:03 JessicaMeixner-NOAA

@JessicaMeixner-NOAA I am not sure about downloading entire folder but S3 interfaces could have some way. The first one CLI - https://docs.aws.amazon.com/cli/latest/reference/s3/ and other way is writing some scripts. I have some Python scripts that can be used for it -

https://github.com/esmf-org/nuopc-comp-testing/blob/141bcf6797d8acfdc1cef6a1f4adb6d84725830e/scripts/get_input.py#L137 https://github.com/esmf-org/nuopc-comp-testing/blob/141bcf6797d8acfdc1cef6a1f4adb6d84725830e/scripts/get_input.py#L161

So, you could check those functions. I think they could be used under other application since they get list of files as argument. I hope this helps.

uturuncoglu avatar Mar 17 '23 18:03 uturuncoglu

@jkbk2004 Yes, you could close this. I am not sure how you are syncing the S3 bucket but if you are doing with the script etc. It would be nice to have those directories defined in there. Otherwise, I might again issue with those files and open another ticket. Thanks for your help.

uturuncoglu avatar Mar 17 '23 18:03 uturuncoglu

Unless there are any objections, I will go ahead and close this, due to the existence of the current UFS s3 input-data bucket. https://noaa-ufs-regtests-pds.s3.amazonaws.com/index.html

zach1221 avatar May 22 '23 21:05 zach1221

@zach1221 Sure. If I have any issue in the future I could create new issue. Thanks to all for your help.

uturuncoglu avatar May 22 '23 21:05 uturuncoglu