staged-recipes
staged-recipes copied to clipboard
Proposed Recipes for Himawari-8 Level 3 SST
Dataset Name
Himawari-8
Dataset URL
https://registry.opendata.aws/noaa-himawari/
Description
@pbranson Has prototyped some initial kerchunk index generation for the Himawari-8 Level 3 SST data as part of a project for OceanHackWeek in this repo. Using his example, I'll try to put together a initial recipe ref https://github.com/oceanhackweek/ohw22-proj-kerchunk/issues/2
License
Open Data
Data Format
NetCDF
Data Format (other)
No response
Access protocol
S3
Source File Organization
s3://noaa-himawari8/{dataset}/{year}/{month}/{day}/{hour}/YYYYMMDDHHHHSS-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.80-v02.0-fv01.0.nc
Which alters to s3://noaa-himawari8/{dataset}/{year}/{month}/{day}/{hour}/YYYYMMDDHHHHSS-NCCF-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.80-v02.0-fv01.0.nc
from 7 April 2021 forward.
Example URLs
s3://noaa-himawari8/AHI-L2-FLDK-SST/2022/01/13/0000/20220113000000-NCCF-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.80-v02.0-fv01.0.nc
s3://noaa-himawari8/AHI-L2-FLDK-SST/2020/01/13/0000/20200113000000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.80-v02.0-fv01.0.nc
Authorization
No response
Transformation / Processing
NA
Target Format
Reference Filesystem (Kerchunk)
Comments
No response
@sharkinsspatial, per our "jam" session yesterday with @wildintellect, Anthony Lukach, and Aimee Barciauskas, I'm posting details on our work to identify gaps in the available files in AWS S3.
We may want to tidy things up a bit within a single script, but here are all the parts.
Produce List of Actual L3C Files
To produce a list of the relevant L3C files in lexicographical (and also chronological order, given the naming convention):
aws s3 ls --recursive s3://noaa-himawari8/AHI-L2-FLDK-SST/ | grep '[-]L3C' | cut -c 32- > l3c-actual.txt
Example file:
AHI-L2-FLDK-SST/2020/01/16/1500/20200116150000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.60-v02.0-fv01.0.nc
Upon visual inspection, we identified 3 places where the file pattern shifts:
-
Change in version from
V2.60
toV2.71
occurring at this point in the list (notice the transition occurs between the 2 middle items, but that both of those items share the same date/time value, so we have an overlap):AHI-L2-FLDK-SST/2020/07/02/1300/20200702130000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.60-v02.0-fv01.0.nc AHI-L2-FLDK-SST/2020/07/02/1400/20200702140000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.60-v02.0-fv01.0.nc AHI-L2-FLDK-SST/2020/07/02/1400/20200702140000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.71-v02.0-fv01.0.nc AHI-L2-FLDK-SST/2020/07/06/1300/20200706130000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.71-v02.0-fv01.0.nc
-
Change in version from
V2.71
toV2.80
(notice that there are several missing hourly files between these 2, so we cannot tell specifically which hourly interval at which the version change occurs):AHI-L2-FLDK-SST/2021/03/22/1600/20210322160000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.71-v02.0-fv01.0.nc AHI-L2-FLDK-SST/2021/03/23/0100/20210323010000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.80-v02.0-fv01.0.nc
-
Change in format from
STAR
toNCCF
(again, there are several hourly files missing between these 2):AHI-L2-FLDK-SST/2021/04/05/1500/20210405150000-STAR-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.80-v02.0-fv01.0.nc AHI-L2-FLDK-SST/2021/04/06/1200/20210406120000-NCCF-L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_V2.80-v02.0-fv01.0.nc
Further, the actual transition from STAR
to NCCF
(shown above) appears not to jive with the documented time where that change should occur, which seems to indicate that the transition should appear starting 2021/05/03, but perhaps that document only indicates when NCCF files will first become available, but not necessarily the earliest date of NCCF files produced (sometime during 2021/04/05 or 2021/04/06 based upon the 2 files shown above). [Thanks to @wildintellect for locating this reference.]
Produce List of Expected L3C Files
Given that our visual inspection of the S3 file list makes it apparent that there are numerous gaps, we want to produce a list of expected hourly files so that we can identify all of the gaps.
This logic is similar to what we'll need for our FilePattern
, and uses the pattern changes identified above (and perhaps writing something to automatically identify such pattern changes would be helpful, to avoid the manual, error-prone visual inspection):
# list-expected-files.py
import pandas as pd
def print_filenames(start, end, format, version):
for date in pd.date_range(start, end, freq="1H"):
print(
"AHI-L2-FLDK-SST/{time:%Y/%m/%d/%H}00/{time:%Y%m%d%H}0000-{format}-"
"L3C_GHRSST-SSTsubskin-AHI_H08-ACSPO_{version}-v02.0-fv01.0.nc".format(
time=date,
format=format,
version=version
)
)
print_filenames("2019-12-10 16:00:00", "2020-07-02 14:00:00", "STAR", "V2.60")
print_filenames("2020-07-02 14:00:00", "2021-03-22 16:00:00", "STAR", "V2.71")
print_filenames("2021-03-23 01:00:00", "2021-04-05 15:00:00", "STAR", "V2.80")
print_filenames("2021-04-06 12:00:00", "2022-08-17 23:00:00", "NCCF", "V2.80")
To produce the list of expected files:
python list-expected-files.py > l3c-expected.txt
Identify Missing L3C Files
We can now produce a list of files that are missing from S3:
diff l3c-actual.txt l3c-expected.txt | grep "^>" | sed -E 's/^> (.*)/\1/' > l3c-missing.txt
For reference, I've attached a list of missing files, through 2022-08-17: l3c-missing.txt
@Patrick-Keown In preparation for generating a kerchunk
index for the Himawari-8 Level 3 SST data we have identified several missing hourly time steps and a potential inconsistency in the timestamps for changes in product version id. There is email contact associated with the NOAA BDP https://registry.opendata.aws/noaa-himawari/ but in the spirit of tracking of conversation in an open repository I was hopeful that you might be able to provide some feedback on these anomalies, if there is more appropriate point of contact for these discussions please let me know.
- Are these missing hourly timestamps expected?
- Is there documentation describing the key naming structure for the version and the
STAR
toNCCF
transition?
Thank you in advance for any input you can provide.
Hi Patrick, Sean,
I'm definitely not the best person to talk about L3 products. STAR transitioned all Himawari 8 processing to OSPO about a year and a half ago and I was only tangentially related to that process.
As for the times missing, could you please provide some examples? Are you only using the L2 SST data?
Thanks, Matt.
Matthew Jochum NESDIS / STAR / ITT System Owner - NOAA5018 Network Lead
O: 1-301-683-3506
I don't always test my code, but when I do, I do it in production.
On 8/22/22 09:59, Patrick Keown - NOAA Federal wrote:
Hi Sean,
I am CC'ing Matt Jocum who I believe will be able to answer your data specific questions on Himawari data.
Matt,
Are you able to assist?
Thanks,
Patrick Keown
Program Manager, NOAA Open Data Dissemination (NODD)
Office of the Chief Information Officer (OCIO)
National Oceanic & Atmospheric Administration
(615) 319-5906 | @.***
"Be sure when you step, step with care and great tact" - Dr. Seuss
On Fri, Aug 19, 2022 at 5:39 PM Sean Harkins @.***> wrote:
@Patrick-Keown https://github.com/Patrick-Keown In preparation for generating a kerchunk index for the Himawari-8 Level 3 SST data we have identified several missing hourly time steps and a potential inconsistency in the timestamps for changes in product version id. There is email contact associated with the NOAA BDP https://registry.opendata.aws/noaa-himawari/ but in the spirit of tracking of conversation in an open repository I was hopeful that you might be able to provide some feedback on these anomalies, if there is more appropriate point of contact for these discussions please let me know.
- Are these missing hourly timestamps expected? - Is there documentation describing the key naming structure for the version and the STAR to NCCF transition?
Thank you in advance for any input you can provide.
— Reply to this email directly, view it on GitHub https://github.com/pangeo-forge/staged-recipes/issues/173#issuecomment-1221111094, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOISIED6AK4NMX2UIYV76ADVZ75G7ANCNFSM5626VOGQ . You are receiving this because you were mentioned.Message ID: @.***>
@Patrick-Keown I don't see an email in the chain for Matthew Jochum, is there a good contact point for him (you can message me directly with it so as not to publish his email on a open channel). For a list of the missing L3 SST time steps you can review this list https://github.com/pangeo-forge/staged-recipes/files/9377046/l3c-missing.txt compiled by @chuckwondo .
It looks like Matt mentioned to me that he may not be the best contact. With that said, I would actually reach out to @.*** and they would be able to direct you to the appropriate contact.
Patrick Keown
Program Manager, NOAA Open Data Dissemination (NODD)
Office of the Chief Information Officer (OCIO)
National Oceanic & Atmospheric Administration
(615) 319-5906 | @.***
"Be sure when you step, step with care and great tact" - Dr. Seuss
On Wed, Aug 24, 2022 at 4:45 PM Sean Harkins @.***> wrote:
@Patrick-Keown https://github.com/Patrick-Keown I don't see an email in the chain for Matthew Jochum, is there a good contact point for him (you can message me directly with it so as not to publish his email on a open channel). For a list of the missing L3 SST time steps you can review this list https://github.com/pangeo-forge/staged-recipes/files/9377046/l3c-missing.txt compiled by @chuckwondo https://github.com/chuckwondo .
— Reply to this email directly, view it on GitHub https://github.com/pangeo-forge/staged-recipes/issues/173#issuecomment-1226303253, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOISIEF5MNBYSRUHEX4GYQTV22CWDANCNFSM5626VOGQ . You are receiving this because you were mentioned.Message ID: @.***>
Hi Sean,
I am CC'ing Matt Jocum who I believe will be able to answer your data specific questions on Himawari data.
Matt,
Are you able to assist?
Thanks,
Patrick Keown
Program Manager, NOAA Open Data Dissemination (NODD)
Office of the Chief Information Officer (OCIO)
National Oceanic & Atmospheric Administration
(615) 319-5906 | @.***
"Be sure when you step, step with care and great tact" - Dr. Seuss
On Fri, Aug 19, 2022 at 5:39 PM Sean Harkins @.***> wrote:
@Patrick-Keown https://github.com/Patrick-Keown In preparation for generating a kerchunk index for the Himawari-8 Level 3 SST data we have identified several missing hourly time steps and a potential inconsistency in the timestamps for changes in product version id. There is email contact associated with the NOAA BDP https://registry.opendata.aws/noaa-himawari/ but in the spirit of tracking of conversation in an open repository I was hopeful that you might be able to provide some feedback on these anomalies, if there is more appropriate point of contact for these discussions please let me know.
- Are these missing hourly timestamps expected?
- Is there documentation describing the key naming structure for the version and the STAR to NCCF transition?
Thank you in advance for any input you can provide.
— Reply to this email directly, view it on GitHub https://github.com/pangeo-forge/staged-recipes/issues/173#issuecomment-1221111094, or unsubscribe https://github.com/notifications/unsubscribe-auth/AOISIED6AK4NMX2UIYV76ADVZ75G7ANCNFSM5626VOGQ . You are receiving this because you were mentioned.Message ID: @.***>