searvey icon indicating copy to clipboard operation
searvey copied to clipboard

NDBC Decommissioning

Open tomsail opened this issue 8 months ago • 3 comments

Changes Overview

NDBC buoy data services will be decommissioned on May 30, 2025, affecting domains:

  • https://www.ndbc.noaa.gov/
  • https://dods.ndbc.noaa.gov/thredds/

Data will transition to NCEI systems:

  • HTTPS: https://www.ncei.noaa.gov/data/oceans/ndbc/
  • THREDDS: https://www.ncei.noaa.gov/thredds-ocean/catalog/ndbc/cmanwx/catalog.html

Key Impacts

  1. Station List Retrieval

  2. Data Retrieval

    • All ndbc_api calls must be redirected to NCEI endpoints
    • Potential differences in data structure and access patterns

Possible Actions

  1. Update Code

    • Modify endpoint URLs in ndbc_api
    • Test before cutoff date
  2. Address Station List

    • Develop new method to obtain station metadata from NCEI
    • Create appropriate fallbacks
  3. Verify Data Equivalence

    • Run parallel retrievals from both sources
    • Document any discrepancies

Critical Questions for Experts

  1. Is there a direct replacement for https://www.ndbc.noaa.gov/wstat.shtml?
  2. Are there structural differences between NDBC and NCEI data?
  3. Are there different access requirements for NCEI endpoints?
  4. Is there a test environment available before cutover?

tomsail avatar Apr 23 '25 09:04 tomsail

pinging @aliabdolali @AliS-Noaa @CDJellen

tomsail avatar Apr 23 '25 09:04 tomsail

FIY I also tried to download all available data using this script:

# pip install searvey fastparquet

import os
import searvey
import ndbc_api
import pandas as pd

os.makedirs("data", exist_ok=True)

ndbc_stations = searvey.get_ndbc_stations()

api = ndbc_api.NdbcApi()
start = pd.Timestamp(1900)
end = pd.Timestamp.now()

for i_s, s in ndbc_stations.iloc[11:].iterrows():
    for mode in api.get_modes(): 
        df = searvey.fetch_ndbc_station(s.Station, mode=mode, start_date=start, end_date=end)
        if not df.empty:
            try: 
                df.attrs =  {key: str(value) for key, value in s.to_dict().items()}
                df.to_parquet(f"data/{s.Station}_{mode}.parquet")
            except Exception as e:
                print(f"could not download {s.Station}, mode:{mode}")

Which downloaded data for ~100 stations (amounting to 3.1 GB) in 4/5 hours.

Except the C-MAN STATIONS (maybe because of the uppercase referencement on https://www.ndbc.noaa.gov/wstat.shtml and lowercase storing on THREDDS: https://dods.ndbc.noaa.gov/thredds/catalog/data/swden/catalog.html ?)

Maybe I did something wrong in that case let me know

tomsail avatar Apr 23 '25 09:04 tomsail

Thank you foe the heads up @tomsail ; I will look into migrating the ndbc API package without changing the API surface. Hopefully this will help determine if there is a good replacement for wstat.shtml

CDJellen avatar Apr 23 '25 15:04 CDJellen