EPATADA icon indicating copy to clipboard operation
EPATADA copied to clipboard

pH harmonization issues

Open hillarymarler opened this issue 1 year ago • 6 comments

After TADA_UnitConversion and TADA_HarmonizeSynonyms, pH results from some data sets are being grouped into multiple TADA.ComparableDataIdentifiers.

A data set that can be used to see this is:

data <- TADA_DataRetrieval(statecode = "IL", startDate = "2010-01-01", endDate = "2020-12-31", huc = c("0714010505", "0714010504", "0714010508", "0714010501", "0714010503"), characteristicType = "Physical", applyautoclean = TRUE)

image

Ideally, all of these pH results would be identified with the same TADA.ComparableDataIdentifier.

To solve this we could edit the metadata for pH entries using the harmonization table (we can specify that fraction is not needed for PH in the assumptions/notes column). All of these should harmonize to PH_NA_NA_NONE.

hillarymarler avatar May 09 '24 19:05 hillarymarler

Potential similar issues observed with total nitrogen in same example data set.

hillarymarler avatar May 10 '24 19:05 hillarymarler

I think other characteristics in the "Physical" characteristic type group should be reviewed as well.

hillarymarler avatar Jun 26 '24 16:06 hillarymarler

@cefergus and @wokenny13 - should suspended fraction pH results be harmonized with all other pH results?

This is the current harmonization table, where the included instances for pH all assume no fraction.

image

hillarymarler avatar Jul 29 '24 18:07 hillarymarler

I am not too familiar with the fraction components for pH in terms of what is commonly used by other organization/states for analysis.

Was "suspended" recently added as an allowable fraction text for "pH"?

Does the current harmonization table get updated regularly? How does that update process work?

Do we know what percentage of pH results have the fraction "suspended" in it?

What is the logic on why dissolved and total pH assumes fraction is NA for pH? If this is being harmonized to NA for dissolved and total, perhaps it makes sense to harmonize suspended to all other NA as well. But I am not sure what is best

wokenny13 avatar Jul 29 '24 19:07 wokenny13

I found the "suspended" pH results in a test data set I downloaded from the WQP (see first post in this issue). There are quite a few combinations of characteristic/fraction/speciation that are not included in the current harmonization reference table. There ref table was created by pulling most common combinations of characteristic/fraction/speciation from WQX (see this related issue for more details: https://github.com/USEPA/EPATADA/issues/319).

I am also working on addressing issue 319 to update/add more combinations to the reference table.

I am not sure what percentage of pH results have the fraction suspended, and I can't come up with an easy/efficient way to determine that. I'm currently running a modified version of the new combo script from issue 319 and looping it over 100 random datasets to generate a list of combinations that are not currently in the harmonization reference table. I can let you know how many data sets the pH/suspended combination pops up in.

I am looking through previous issues and documentation to see if I can find any discussion re: NA fraction for pH. I haven't found any yet, but I will link it here if I do!

hillarymarler avatar Jul 29 '24 20:07 hillarymarler

I'm not that familiar with different fractions of pH - but doing some googling - it seems plausible that there could be suspended pH measurements and including it in the TADA harmonization table makes sense. Does suspended mean that the sample is a mixture of liquid and solids (e.g., sediment)? The alternative would be pH from a filtered sample? It seems like the pH measures could be different depending on whether there are solids (e.g., certain rock types that may have lower/higher pH) in the sample vs if it's been filtered. Thinking like mine waste samples or things along those lines.

cefergus avatar Jul 29 '24 23:07 cefergus