EPATADA icon indicating copy to clipboard operation
EPATADA copied to clipboard

WQX v3.0 Testing, Development and Updates

Open wokenny13 opened this issue 1 year ago • 5 comments

Is your feature request related to a problem? Please describe: Services are still under development with WQP 3.0 beta and testing. There is interest in hearing from us re: testing if we notice issues with TADA workflows when we start our updates, especially if there are example workflows that might be useful for testing.

Describe the solution you'd like: Pushing the changes through in TADA in relation to the WQP 3.0 format will likely be held back for at least half a year (Around March 2025). Prior to this push, there will be a need to validate any impacts to TADA workflow and processes while the dataRetrieval package from USGS are being made so that there is a smooth, quick and efficient process downstream. There is a stable CRAN as well as a developer version on GitHub, and testing should be done with the GitHub version.

Additional context:

Please see GitHub link from USGS on updates, plans and documentation on dataRetrieval function.

dataRetrieval status updates: https://doi-usgs.github.io/dataRetrieval/articles/Status.html dataRetrieval development plan: https://doi-usgs.github.io/dataRetrieval/articles/wqx3_development_plan.html Lee Stanish said the dev version is using the 3.0 profiles but is still very much in development

wokenny13 avatar Sep 27 '24 18:09 wokenny13

@cefergus I added some details here from our emails for tracking:

If you run: schema <- readr::read_csv("https://www.epa.gov/system/files/other-files/2024-07/schema_outbound_wqx3.0.csv") you get a table that can be used to find what needs replacing. You could use that to create a function that converts the new data to the old names so you don't need to update your code. BUT, we're generally thinking over here that it would usually be better to update the code to work with the "modern" outputs.

Let's start by just testing if all our functions can run still if we use the new beta services/profiles but the old names. Eventually we’ll need to switch the code base & function outputs to use/reference the new names too. For this, it is still going to be helpful to have a function that can easily covert the columns from back and forth from the “legacy” to the new names.

Note on performance: The services have been working really well lately, but folks doing big queries (like our group) still might need to rejigger the expectations of what can come back from a single query. Let's wait to update our big data retrieval functions (automatic chunking of pulls for users if needed) after we switch to the new services.

cristinamullin avatar Dec 06 '24 18:12 cristinamullin

Here are the lines in TADA that we will need to update to use the new profiles: See lines 282-305 in DataDiscoveryRetrieval.R https://github.com/USEPA/EPATADA/blob/develop/R/DataDiscoveryRetrieval.R

# Retrieve all 3 profiles
  print("Downloading WQP query results. This may take some time depending upon the query size.")
  print(WQPquery)
  results.DR <- dataRetrieval::readWQPdata(WQPquery,
    dataProfile = "resultPhysChem",
    ignore_attributes = TRUE
  )
  # check if any results are available
  if ((nrow(results.DR) > 0) == FALSE) {
    print("Returning empty results dataframe: Your WQP query returned no results (no data available). Try a different query. Removing some of your query filters OR broadening your search area may help.")
    TADAprofile.clean <- results.DR
  } else {
    sites.DR <- dataRetrieval::whatWQPsites(WQPquery)
 
    projects.DR <- dataRetrieval::readWQPdata(WQPquery,
      ignore_attributes = TRUE,
      service = "Project"
    )
 
    TADAprofile <- TADA_JoinWQPProfiles(
      FullPhysChem = results.DR,
      Sites = sites.DR,
      Projects = projects.DR
    )

More from Laura D on USGS dataRetrieval:

Just to let you all know on my personal vocabulary - I usually now refer to our classic WQP calls as "legacy". At the moment, that's probably not the best term since it is still the production version of the Portal (WQP). But, we're working towards a release where the beta services on WQP become production. When that happens, the current system will be considered legacy and the new system will just be the default. I have NO idea when that might be.

I'll check on what's going on with the missing columns. Most of my tests for "legacy" don't usually specify the resultPhysChem profile (they are even more legacy-y dating back to when WQP didn't even have profiles). I'll let you know if I push up a fix on GitHub.

If you want to see how the new profiles look, you can try this:

  results.DR <- dataRetrieval::readWQPdata(WQPquery,
    service = "ResultWQX3",
    dataProfile = "basicPhysChem",
    ignore_attributes = TRUE
  )
 
sites.DR <- dataRetrieval::whatWQPsites(WQPquery,
                                                                         legacy = TRUE)

At the moment....I ran some tests and simple queries and they do seem to be completing today. There's currently no WQX3 version of the "Project" dataProfile, so that's something that you would need to wait for anyway.

cristinamullin avatar Dec 06 '24 18:12 cristinamullin

@cefergus I created a branch for these edits. See: https://github.com/USEPA/EPATADA/tree/WQX3.0betatesting

I started editing the TADA_DataRetreival function to use the new 3.0 full phys chem profile

cristinamullin avatar Dec 11 '24 23:12 cristinamullin

Pushed a function to rename WQX3.0 column names back to WQX2.0 legacy names referencing the online schema. However, we noticed that there are differences in special characters between the 2.0 names and names used in TADA. Next steps are to identify which column names need to be changed to match TADA. Once this is fixed we can test how well TADA autoclean works with the data uploaded using the new service.

cefergus avatar Dec 23 '24 19:12 cefergus

Updated TADA_RenameColumn function to rename WQX3.0 columns to legacy version and/or names used in TADA_AutoClean. Calling TADA_DataRetrieval with applyautoclean = TRUE is working.

Will test it on other queries to see if there are other misalignment issues to work through.

cefergus avatar Jan 02 '25 15:01 cefergus

Note from Laura Decicco:

The WQP "beta" has been working pretty well lately, so I would recommend switching over to the WQX3 results (so you would no longer need to say 'doesn't have USGS older than March 2024'). How to do that? If you are using readWQPqw, add legacy=FALSE:

rawPcode <- readWQPqw("USGS-01594440", "01075", "", "")

# If you are using readWQPdata, add service = "ResultWQX3" like this:

pHData_wqx3 <- readWQPdata(siteid = "USGS-04024315", 
                           characteristicName = "pH",
                           service = "ResultWQX3",
                           dataProfile = "basicPhysChem")

cristinamullin avatar Jul 07 '25 16:07 cristinamullin

I created a new issue with next steps: https://github.com/USEPA/EPATADA/issues/644. This issue will close after the new TADA_RenametoLegacy function is merged in.

cristinamullin avatar Aug 22 '25 20:08 cristinamullin