get_basin issue with gage 06899700
What happened?
I don't know that this is a HyRiver issue, but I know I have gotten basin geometry in the past using pynhd.get_basin for gage 06899700 and 'nwissite'. When I try today it says the service returns no features. I can get geometries for other gages.
When I run the same command using Swagger here I also get an error: The comid for feature source 'nwissite' and feature ID 'USGS-06899700' does not exist.
I tried running on Swagger with WQP and I do get a geometry returned. When I set get_basin(fsource='WQP'), I still get the same error.
I have a workaround to get the geometry for this particular gage, but the change in behavior and inconsistency with the Swagger results seem odd.
What did you expect to happen?
- I expected to get a basin geometry from nwissite from this gage because I have done it before
- I expected to get a basin geometry from WQP after doing so successfully with Swagger tool
Minimal Complete Verifiable Example
from pynhd import NLDI
station_id = '06899700'
basin1 = NLDI().get_basins(station_id, fsource='WQP')
basin2 = NLDI().get_basins(station_id, fsource='nwissite')
MVCE confirmation
- [x] Minimal example — the example is as focused as reasonably possible to demonstrate the underlying issue.
- [x] Complete example — the example is self-contained, including all data and the text of any traceback.
- [x] New issue — a search of GitHub Issues suggests this is not a duplicate.
Relevant log output
Anything else we need to know?
No response
Environment
SYS INFO
commit: None python: 3.12.8 | packaged by conda-forge | (main, Dec 5 2024, 14:06:27) [MSC v.1942 64 bit (AMD64)] python-bits: 64 OS: Windows OS-release: 10 machine: AMD64 processor: Intel64 Family 6 Model 186 Stepping 2, GenuineIntel byteorder: little LC_ALL: None LANG: None LOCALE: ('English_United States', '1252')
PACKAGE VERSION
async-retriever 0.19.1 pygeoogc 0.19.0 pygeoutils 0.19.0 py3dep 0.19.0 pynhd 0.19.0 pygridmet N/A pydaymet N/A hydrosignatures 0.19.0 pynldas2 N/A pygeohydro 0.19.0 aiohttp 3.11.11 aiohttp-client-cache 0.12.4 aiosqlite 0.20.0 cytoolz 1.0.1 ujson 5.10.0 defusedxml 0.7.1 joblib 1.4.2 multidict 6.1.0 owslib 0.32.1 pyproj 3.7.0 requests 2.32.3 requests-cache 1.2.1 shapely 2.0.6 url-normalize 1.4.3 urllib3 2.3.0 yarl 1.18.3 geopandas 1.0.1 netcdf4 1.7.2 numpy 2.2.2 rasterio 1.4.3 rioxarray 0.18.2 scipy 1.15.1 xarray 2025.1.1 click 8.1.8 networkx 3.4.2 pyarrow 19.0.0 folium 0.19.4 h5netcdf 1.4.1 matplotlib 3.10.0 pandas 2.2.2 numba N/A py7zr N/A pyogrio 0.10.0
For WQP, you need to provide the agency code too to make it work:
from pynhd import NLDI
station_id = 'USGS-06899700'
basin1 = NLDI().get_basins(station_id, fsource='WQP')
Under the hood, when fsource="nwissite", the get_basins function adds USGS- if it doesn't exist, but for others you'd need to provide it. It seems that for this particular NWIS site, NLDI does not index it. Since it works, for example, for 01031500.
You can get the basin for this station using WaterData though:
from pynhd import WaterData
basin = WaterData("gagesii_basins").byid("gage_id", "06899700")
I know it's a bit confusing with the USGS prefix, but unfortunately, since they are from different datasets, it's difficult to keep them consistent. My recommendation is to always pass the agency code when working with NLDI's basins endpoint.
In conversation with @cheginit it appears this traces back to this root issue. https://github.com/internetofwater/ref_gages/issues/41
There are two sites in the same place that caused the site indexing to be broken.
We have some other edits to the reference gages coming in soon and I'll try and get this issue addressed as that happens. Thanks for the report!
Great, glad I could help!
@cheginit thanks the clarification on the agency code. I have exclusively used nwissite to date so I didn't realize that. Thanks for pointing out the WaterData option as well. I'm pretty new to this world and do get lost sometimes as to where data is coming from and what data is available from what sources. But I've become a HyRiver evangelist because I usually don't need to understand the details to be able to utilize the tools. So thanks again for a very helpful resource!
I will open an issue for another observation I had this week. Again, I don't think it's a HyRiver bug, but it did help to point out a difference in behavior from what I was expecting.
Thanks for being an active user and helping to improve the software stack.
For WQP, you need to provide the agency code too to make it work:
from pynhd import NLDI station_id = 'USGS-06899700'
basin1 = NLDI().get_basins(station_id, fsource='WQP') Under the hood, when
fsource="nwissite", theget_basinsfunction addsUSGS-if it doesn't exist, but for others you'd need to provide it. It seems that for this particular NWIS site, NLDI does not index it. Since it works, for example, for 01031500.You can get the basin for this station using
WaterDatathough:from pynhd import WaterData
basin = WaterData("gagesii_basins").byid("gage_id", "06899700") I know it's a bit confusing with the USGS prefix, but unfortunately, since they are from different datasets, it's difficult to keep them consistent. My recommendation is to always pass the agency code when working with NLDI's basins endpoint.
@cheginit May I ask that in the latest pynhd package, the WaterData does not have gagesii_basins. Is there another function replacing this one? Thanks!
WaterData GeoServer is still having some technical difficulties, so some endpoints including the basins are still not back up. They are working on fixing it, so hopefully it will be resolved soon. In the meantime, you can get the whole dataset from here.
WaterData GeoServer is still having some technical difficulties, so some endpoints including the basins are still not back up. They are working on fixing it, so hopefully it will be resolved soon. In the meantime, you can get the whole dataset from here.
@cheginit Thank you so much for providing the data! It is very helpful! Does this dataset include gauge stations that more than NLDI and WaterData?
The original gageii dataset was released in 2011 and was based on NHD v1. WaterData's gagesii layer provides access to this original version, so it's the same. NLDI, on the other hand, is based on NHD v2, so it's different and the basins that you get from NLDI might not even match the ones that you get from gagesii due to the difference in the underlying data that are used to derive the basins.
The original gageii dataset was released in 2011 and was based on NHD v1. WaterData's gagesii layer provides access to this original version, so it's the same. NLDI, on the other hand, is based on NHD v2, so it's different and the basins that you get from NLDI might not even match the ones that you get from gagesii due to the difference in the underlying data that are used to derive the basins.
@cheginit Got it! Thank you so much for the clarification.
The original gageii dataset was released in 2011 and was based on NHD v1. WaterData's gagesii layer provides access to this original version, so it's the same. NLDI, on the other hand, is based on NHD v2, so it's different and the basins that you get from NLDI might not even match the ones that you get from gagesii due to the difference in the underlying data that are used to derive the basins.
@cheginit Hi, I have another question. I used the NLDI function (as code below) to get gauge station features. Previously, it worked well. But I ran the same code again today, many sites have no feature returned from the service. May I ask how should I solve this issue?
for i in range(len(sites_left_df)): try: sites_feature = nldi.getfeature_byid( fsource = "nwissite", fids=sites_left_df['site_no'][i] ) sites_feature_list.append(sites_feature) sites_feature_list_df = pd.concat(sites_feature_list, ignore_index=True) except Exception as e: print(f"{e}")
You don't need to use a loop, it's more efficient to just pass the site IDs as a list to getfeature_byid. Also, you need to be more specific about site IDs that used to work, but now they don't. Give me a couple of such stations so I can check what's the issue.
You don't need to use a loop, it's more efficient to just pass the site IDs as a list to
getfeature_byid. Also, you need to be more specific about site IDs that used to work, but now they don't. Give me a couple of such stations so I can check what's the issue.
Thank you for reminding me that I don't need the loop. It is much faster. I was trying to reproduce the site IDs that were not returned last night. But it works now. Like I coulc only get 99 sites returned last night and now I can get 355. Is it something with the API connection?
The NLDI web service recently added a rate limit to their service. After 3600 requests per hour, per IP, it will return an error due to exceeding the rate limit. It's possible that you exceeded the rate limit, since requests get accumulated across all NLDI endpoints that you request. There's nothing that I can do from the HyRiver side, so you have to be more mindful of your queries to NLDI.
The NLDI web service recently added a rate limit to their service. After 3600 requests per hour, per IP, it will return an error due to exceeding the rate limit. It's possible that you exceeded the rate limit, since requests get accumulated across all NLDI endpoints that you request. There's nothing that I can do from the HyRiver side, so you have to be more mindful of your queries to NLDI.
Thank you for the information! It is very helpful! For example, if I use the following code: sites_feature = nldi.getfeature_byid( fsource = "nwissite", fids=sites_left_df['site_no'] ) If there are 1000 sites in the sites_left_df. Will it request 1000 times or one time?
It will be 1000 requests.
It will be 1000 requests.
Make sense! Thanks!