EPATADA troubleshoot 429 errors for functions relying on ATTAINS geospatial web services

Discuss prevalence of this error with ATTAINS team and determine if adding a pause between requests in the impacted functions would be a viable solution. If yes, make this change in the functions.

Aug 19 '25 21:08 hillarymarler

Is there any advantage to using arcgislayers package https://github.com/R-ArcGIS/arcgislayers?

Aug 20 '25 14:08 hillarymarler

From what I can tell right now the loop just keeps going until an empty response comes back (so at least one more than it needs to). Might be worth getting a count and then doing that many requests. The limit I'm seeing is 10k rather than 1k, I thought the default was 2k - we could also get the limit from the service first in case it ever changed.

The query is relatively static/simple (just long list of params) definitely pros to using a package designed for hitting esri services, just have to make sure it meets your needs and doesn't broaden dependencies too much.

Aug 20 '25 15:08 jbousquin

looks like arcgislayers package is being used in fetchNHD()...

@mhweber - I used to use NHDplusTools or something like that to get NHD but I feel like there has been change since then - is there a good package for getting NHD HR? (I see flowlines, waterbodies, catchments)

Aug 20 '25 15:08 jbousquin

Looks like httr not httr2 is being used - is that consistent throughout TADA?

Aug 20 '25 16:08 jbousquin

Looks like httr not httr2 is being used - is that consistent throughout TADA?

Ah, good catch. USGS dataRetrieval switched to httr2 recently. I will update this so we are consistently using httr2 as well.

Aug 20 '25 16:08 cristinamullin

Looks like httr2 includes built-in support for features like rate-limiting (req_throttle()) if this ends up having to go that route.

Aug 20 '25 16:08 jbousquin

I used httr2 for altUSEPA/rExpertQuery (some functions are used in EPATADA), so it would be great to have both packages be consistent.

Aug 20 '25 16:08 hillarymarler

@jbousquin There is an NHDPlus HR service available but I think nhdplusTools is still just using a downloader for NHDPlusHR and only leveraging services for NHDPlus Medium Res through the NLDI (be worth asking Dave - I can ask). @hillarymarler , @jbousquin I switched everything in StreamCatTools over to httr2 recently and am successfully using req_throttle() and req_retry() in my sc_get_data function in the package. Also you may already be doing something similar it sounds like but I split requests if I exceed a certain number of entries and pass a list of requests to a create_post_request() function using purrr::map_dfr.

Aug 20 '25 16:08 mhweber

Utilities::TADA_CheckColumns() seems like it could be leveraged for the spatial cols checks in these functions - everyone OK with me updating those (just going to add arg for non-default message so default is current behavior)? Wanted to check before expending scope of branch.

Aug 20 '25 17:08 jbousquin

Heads: was getting a consistent 500 Internal Server Error from dataRetrieval::getWebServiceData(baseURL) when trying to use the fetchATTAINS() example data:

tada_data <- TADA_DataRetrieval(
+   startDate = "1990-01-01",
+   endDate = "1990-12-30",
+   characteristicName = "pH",
+   statecode = "NV",
+   applyautoclean = TRUE,
+   ask = FALSE
+ )

Grabbed the query info from the last test run to have something local to work though it with:

tada_data <- TADA_DataRetrieval(
+   startDate = "2022-06-07",
+   endDate = "2022-06-08",
+   characteristicName = "pH",
+   statecode = "NY",
+   applyautoclean = TRUE,
+   ask = FALSE
+ )

examples in geospatialFunctions.R are \dontrun - but may be worth including in build tests (not familiar w/ what testthat does or doesn't do in that respect).

Aug 20 '25 19:08 jbousquin

As I go I'm realizing the query exit condition is nested meaning it's rarely just 1 extra query to get an empty response, rather it is when all responses from all layers are empty and if it is a large area there is some added splitting into clustered bboxes. These are presumably a lot of very small or empty queries which could be sent in fast succession causing the error. Working on a batching refactor that should avoid that. I'm not convinced there is a huge efficiency gain from the bbox clustering - has any profiling/performance testing been done on that?

Aug 21 '25 15:08 jbousquin

As I go I'm realizing the query exit condition is nested meaning it's rarely just 1 extra query to get an empty response, rather it is when all responses from all layers are empty and if it is a large area there is some added splitting into clustered bboxes. These are presumably a lot of very small or empty queries which could be sent in fast succession causing the error. Working on a batching refactor that should avoid that. I'm not convinced there is a huge efficiency gain from the bbox clustering - has any profiling/performance testing been done on that?

@kathryn-willi Hope you are doing well. Justin is helping us troubleshoot some issues with the ATTAINS queries. I don't recall if any profiling/performance testing had been done on the bbox clustering approach in fetchATTAINS. Do you recall why we went with this approach? Thanks!

Aug 21 '25 19:08 cristinamullin

Was the bbox clustering related to the long run time and general speed improvement in large spatial data pull #589 and with the tribal example data and errors in #583?

Aug 21 '25 20:08 wokenny13

Hi team, yes, the bbox clustering approach was used to speed up long run times. It essentially clusters WQP points into groups so that a massive bbox containing all WQP points isn't used (i.e., reducing the likelihood of returning a bunch of ATTAINS features unrelated to the WQP points). There were some speed tests performed and this approach was much faster when the bbox was above a certain size (I believe that size is mentioned in a commented line above the splitting function!)

Aug 25 '25 16:08 kathryn-willi