searvey NDBC data

https://www.ndbc.noaa.gov/ ndbcheader

See an example here: https://github.com/saeed-moghimi-noaa/prep_obs_ca

# Line 250
#coops_ndbc_obs_collector.py

#################
@retry(stop_max_attempt_number=5, wait_fixed=3000)
def get_ndbc(start, end, bbox , sos_name='waves',datum='MSL', verbose=True):
    """
    function to read NBDC data
    ###################
    sos_name = waves    
    all_col = (['station_id', 'sensor_id', 'latitude (degree)', 'longitude (degree)',
           'date_time', 'sea_surface_wave_significant_height (m)',
           'sea_surface_wave_peak_period (s)', 'sea_surface_wave_mean_period (s)',
           'sea_surface_swell_wave_significant_height (m)',
           'sea_surface_swell_wave_period (s)',
           'sea_surface_wind_wave_significant_height (m)',
           'sea_surface_wind_wave_period (s)', 'sea_water_temperature (c)',
           'sea_surface_wave_to_direction (degree)',
           'sea_surface_swell_wave_to_direction (degree)',
           'sea_surface_wind_wave_to_direction (degree)',
           'number_of_frequencies (count)', 'center_frequencies (Hz)',
           'bandwidths (Hz)', 'spectral_energy (m**2/Hz)',
           'mean_wave_direction (degree)', 'principal_wave_direction (degree)',
           'polar_coordinate_r1 (1)', 'polar_coordinate_r2 (1)',
           'calculation_method', 'sampling_rate (Hz)', 'name'])
    
    sos_name = winds    

    all_col = (['station_id', 'sensor_id', 'latitude (degree)', 'longitude (degree)',
       'date_time', 'depth (m)', 'wind_from_direction (degree)',
       'wind_speed (m/s)', 'wind_speed_of_gust (m/s)',
       'upward_air_velocity (m/s)', 'name'])

May 08 '24 14:05 saeed-moghimi-noaa

See also: https://pypi.org/project/ndbc-api/ https://github.com/cdjellen/ndbc-api

May 08 '24 14:05 saeed-moghimi-noaa

@AliS-Noaa @aliabdolali

What are your preferred web api to download NDBC data?

Thanks

May 08 '24 14:05 saeed-moghimi-noaa

Hello Saeed,

Here are two ways I usually get the NDBC data:

https://github.com/NOAA-EMC/WW3-tools/blob/develop/ww3tools/downloadobs/wfetchbuoy.py

also this is a good tool as well:

https://pypi.org/project/NDBC/

Cheers, -------------------------------------------------------- Ali Salimi-Tarazouj, Ph.D. Physical Scientist, Coastal Engineer Lynker at NOAA/NWS/NCEP/EMC 5830 University Research Court College Park, Maryland, 20740 Office: (202) 964-0965 Mobile: (302) 588-5505

On Wed, May 8, 2024 at 10:39 AM Saeed Moghimi @.***> wrote:

@AliS-Noaa https://github.com/AliS-Noaa @aliabdolali https://github.com/aliabdolali

What are your preferred web api to download NDBC data?

Thanks

— Reply to this email directly, view it on GitHub https://github.com/oceanmodeling/searvey/issues/137#issuecomment-2100745649, or unsubscribe https://github.com/notifications/unsubscribe-auth/A4J7D7UFM5GI22CYFH3JHM3ZBI2LBAVCNFSM6AAAAABHNCR5C2VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDCMBQG42DKNRUHE . You are receiving this because you were mentioned.Message ID: @.***>

May 08 '24 14:05 AliS-Noaa

From a correspondence with one of our colleagues:

Near-real-time observations from NWS Fixed Buoys and NWS C-MAN Stations and from many ROOA operated buoys and coastal stations are available on the ndbc.noaa.gov web site. I don't know if NDBC has an API yet, but one can obtain their obs via HTTPS or DODS/OPeNDAP https://www.ndbc.noaa.gov/docs/ndbc_web_data_guide.pdf However, I found that someone has written 'ndbc-api' to "parse whitespace-delimited oceanographic and atmospheric data distributed as text files for available time ranges, on a station-by-station basis" (https://pypi.org/project/ndbc-api/). I also found ndbc.py at https://pypi.org/project/NDBC/. I imagine there are many others out there.

May 08 '24 15:05 SorooshMani-NOAA

During our meeting on June 5th we discussed the following items/tasks related to NDBC data:

Using ndbc-api pacakge, an alternative package or write from sctach
- For now let's continue with a third-party package, resolve following issues
Possible issues with thirdparty package license
- ndbc-api is MIT if we end up using it
Is the third-party package already on conda-forge or is there a plan for it to be?
- If not, explore other NDBC packages
- If we help adding the Conda package will the original developer maintain?
- Should we go ahead and just create a conda packages and maintain it? (ideally not)
Is raw data available (NDBC itself seems to have QC)
The issue of fetching station data one by one.

Todo:

[ ] @abdu558 to start a PR when the code is ready
[x] @abdu558 to contact the third-party pacakge developer and ask about "conda"-related questions Ans: they are open creating and maintaining conda package
[x] @SorooshMani-NOAA to contact NDBC about raw data [email protected]
[x] @abdu558 to check if the web API provides the capability of multistation data or not: Ans: upstream package uses plain for loop for multiple stations

Jun 05 '24 15:06 SorooshMani-NOAA

Hi @pmav99 today we discussed @abdu558's NDBC implementation. I suggested that he implements everything based on the "new" API (as in #125), but instead of using the _ndbc_api.py as the file name, just use ndbc.py. What do you think?

Also we discussed whether to combine all data into a single dataframe or not and whether to keep the missing value columns, etc. I suggested discussing those in the next group meeting next week.

@abdu558, can you please summarize your questions here as well so that we can discuss them more constructively next week?

Jun 12 '24 14:06 SorooshMani-NOAA

@abdu558, I forgot to ask, what is the state of conda package for ndbc-api? You said they are open to creating the conda package themselves, right?

Jun 12 '24 14:06 SorooshMani-NOAA

Yea they did create it and said it would take a few days ish for it to show up

Jun 12 '24 15:06 abdu558

Response from NDBC:

[...] We do not have an API though we are hopeful to develop one in the future.

Our FAQs might be a good place to start with your quality control questions: https://www.ndbc.noaa.gov/faq/

Jun 12 '24 18:06 SorooshMani-NOAA

Response from NDBC:

[...] We do not have an API though we are hopeful to develop one in the future. Our FAQs might be a good place to start with your quality control questions: https://www.ndbc.noaa.gov/faq/

Thanks Soroosh. more on QC here: https://www.ndbc.noaa.gov/faq/qc.shtml There is an exhaustive guide on the QC methodology (2009 version) and all the QC flags summarized in APPENDIX E.

Jun 13 '24 06:06 tomsail

Hi @pmav99 today we discussed @abdu558's NDBC implementation. I suggested that he implements everything based on the "new" API (as in #125), but instead of using the _ndbc_api.py as the file name, just use ndbc.py. What do you think?

Also we discussed whether to combine all data into a single dataframe or not and whether to keep the missing value columns, etc. I suggested discussing those in the next group meeting next week.

@abdu558, can you please summarize your questions here as well so that we can discuss them more constructively next week?

You answered most of them but one that im not 100% sure of is if when multiple stations:

an extra column is added called station id and the data of the different stations are combined to a single data frame
outputs a dictionary which maps each id -> a dataframe of the stations data

this is the one that im not 100% sure of

Jun 15 '24 18:06 abdu558

@abdu558 different providers return different data. For example, when you try to retrieve data from a bunch of IOC stations you will end up with dataframes with different number of columns and different column names. E.g.

https://www.ioc-sealevelmonitoring.org/bgraph.php?code=aden&output=tab&period=0.5&endtime=2018-06-07 https://www.ioc-sealevelmonitoring.org/bgraph.php?code=abed&output=tab&period=0.5&endtime=2018-06-07

Merging these will result in with a bunch of columns with NaNs. This is problematic because NaNs are floats and consume quite a bit of RAM. If you are retrieving hundreds/thousands of stations for many years this can quickly become problematic

Furthermore, since you can't really know which column will have data for each station, you will end up calling .dropna() for every station id you want to process. Which can also be problematic, because the provider might return NaNs anyhow and you might want to differentiate between those.

Alternatively, you can just avoid merging in the first place. If somebody wants to merge the dictionary it is trivial to do so. E.g.:

data = {
    "st1": pd.DataFrame(index=["2020", "2021"], data={"var1": [111, 222]}),
    "st2": pd.DataFrame(index=["2021", "2022", "2023"], data={"var2": [1, 2, 3], "var3": [0, float("nan"), float("nan")]}),
}
merged = pd.concat(data, names=["station_id", "time"]).reset_index(level=0)

print(data)
print(merged)

Jun 17 '24 06:06 pmav99