aqp
aqp copied to clipboard
segment needs tests
This functions should have tests for expected output given:
- [x]
data.frame
input - [x]
SoilProfileCollection
input - [ ] missing or illogical horizon depths
- [x] same results as
slab()
OK, I took a look at number 3 in your list and found 3 more things that need addressing.
- [ ] cannot segment a single profile if the interval contains missing data (gaps)
- [ ] output ID order does not match input order or input profiles
- [ ] rebuild SoilProfileCollection using
depths<-
and JOIN methods -- notreplaceHorizons<-
Here is a reprex, which is also stored in misc/testing_suite
https://github.com/ncss-tech/aqp/blob/5ea7b827cbaf8794cceba28f9a8bb400c395158f/misc/test_suite/segment-reprex.R
Once the correct behavior is ironed out, they can easily become new tests.
library(aqp, warn.conflicts = FALSE)
#> This is aqp 1.27
library(testthat)
#> Warning: package 'testthat' was built under R version 4.0.3
data(jacobs2000)
# use glom to create some fake missing data (REMOVE 50-100cm zone)
input.spc <- glom(jacobs2000[4,], 50, 100,
truncate = TRUE, invert = TRUE)
##################################
### .:. # 5 cannot segment a single profile if the interval contains missing data (gaps)
##################################
# Error: the 50:100 zone is missing; and only one profile
(output.spc <- segment(input.spc, 50:100))
#> Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 0, 1
# Error: even with data above and below 50/150 -- single profile
(output.spc <- segment(input.spc, 25:150))
#> Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 0, 1
# Error: just 1 cm of missing data in single profile case breaks it
(output.spc <- segment(input.spc, 0:51))
#> Error in data.frame(..., check.names = FALSE): arguments imply differing number of rows: 0, 1
# Works
segment(input.spc, 0:50)
#> SoilProfileCollection with 1 profiles and 50 horizons
#> profile ID: id | horizon ID: hzID
#> Depth range: 50 - 50 cm
#>
#> ----- Horizons (6 / 50 rows | 10 / 19 columns) -----
#> id hzID top bottom name sand clay matrix_color time_saturated
#> 92-4 1 0 1 Ap 88 2 #83796FFF 0
#> 92-4 2 1 2 Ap 88 2 #83796FFF 0
#> 92-4 3 2 3 Ap 88 2 #83796FFF 0
#> 92-4 4 3 4 Ap 88 2 #83796FFF 0
#> 92-4 5 4 5 Ap 88 2 #83796FFF 0
#> 92-4 6 5 6 Ap 88 2 #83796FFF 0
#> matrix_color_munsell
#> 10YR 5/1
#> 10YR 5/1
#> 10YR 5/1
#> 10YR 5/1
#> 10YR 5/1
#> 10YR 5/1
#> [... more horizons ...]
#>
#> ----- Sites (1 / 1 rows | 1 / 1 columns) -----
#> id
#> 92-4
#>
#> Spatial Data:
#> [,1]
#> [1,] NA
#> [1] NA
# Add another profile
input.spc <- combine(input.spc, jacobs2000[7,])
# Works
output.spc <- segment(input.spc, 50:100)
##################################
### .:. #6 output ID order does not match input order or input profiles
##################################
# _NEEDS_ FIX: not only does it have same IDs... it appears to change the order of inputs
(all(profile_id(input.spc) == profile_id(output.spc))) # FALSE
#> [1] FALSE
##################################
### .:. #7 does not provide new IDs for re-sampled profiles
##################################
# Error: Cannot combine because the transformed SPC has the same ID
# (this is a feature not a bug, resampled horizons != original data)
(show.spc <- combine(input.spc, output.spc))
#> Error in pbindlist(objects): non-unique profile IDs detected
# Inspect the default segment ID output -- this is wrong
par(mfrow=c(1,2),mar=c(0,0,2,1))
plot(input.spc, max.depth=200, color="matrix_color", plot.depth.axis=FALSE)
plot(output.spc, max.depth=200, color="matrix_color", name=NA)
# # segment should assign new profile IDs automatically -- like slice, permute_profile, etc
# # the correct IDs are something like thisthis:
output.spc2 <- output.spc
fix.idx <- match(profile_id(output.spc), profile_id(input.spc))
profile_id(output.spc2) <- paste0(profile_id(input.spc)[fix.idx], "segmented")
par(mfrow=c(1,2),mar=c(0,0,2,1))
plot(input.spc, max.depth=200, color="matrix_color", plot.depth.axis=FALSE)
plot(output.spc2, max.depth=200, color="matrix_color", name=NA)
Note: that I am deliberately accounting for reversed order w/ match / fix.idx
- This is not a solution, but shows why ID, and SoilProfileCollection integrity is important.
- Why important? Numeric (site and horizon/slice) indices or positions are used to subset all the time
- if they are shifting around order in between calls to functions because of missing data that is not good
- to unexpected problems -- mostly outside the
SoilProfileCollection
and in the user's plots and code
- Why important? Numeric (site and horizon/slice) indices or positions are used to subset all the time
Suggestions:
Under normal circumstances, it is impossible to make an SPC like the segment output from the base data
- This is because of the forced ordering that
depths<-
does when starting with the horizon data
Temporary solutions: - rebuilding the SPC "fixes" the apparent issue in the algorithm - not an actual fix -- the re-ordering of profiles should be addressed in algorithm
output.spc3 <- rebuildSPC(output.spc)
#> Warning: Horizon top depths contain NA! Check depth logic with
#> aqp::checkHzDepthLogic()
#> Warning: Horizon bottom depths contain NA! Check depth logic with
#> aqp::checkHzDepthLogic()
#> using `hzID` as a unique horizon ID
par(mfrow=c(1,2),mar=c(0,0,2,1))
plot(input.spc, max.depth=200, color="matrix_color")
plot(output.spc3, max.depth=200, color="matrix_color", divide.hz=FALSE, name=NA)
Long term solution:
- In long term we should not calling rebuildSPC
inside a function unless there is some intractable error.
- we do provide one instance of this in pbindlist
merging non-conformal (different source/columns) SPCs
- `segment` should build a SPC using conventional means (`depths<-`)
- by building the SPC with depths, you avoid the problem by writing your algorithm on its own
- let the SoilProfileCollection object handle doing the proper ordering.
- This is why `replaceHorizons(object) <- h` is dangerous.
Opinion:
- I know that I advised of this replaceHorizons
issue some time ago. The old "replace" behavior of horizons()
should have never existed, at least not without the proper validity checks in place, which was not the case until very recently (early/mid summer 2020). I have second thoughts about exporting replaceHorizons
at all because of problems exactly like this.
I wanted to use segment for an example I was working on -- and I came up with this example which is a more concise demo of the phenomenon I showed above, and shows why rebuildSPC()
is currently "needed"
library(aqp, warn.conflicts=FALSE)
#> This is aqp 1.27
data(sp4)
depths(sp4) <- id ~ top + bottom
# does not work with unit-length segment; needs informative error
segment(sp4, 20)
#> Error in `$<-.data.frame`(`*tmp*`, "id", value = "-"): replacement has 1 row, data has 0
# this appears to work correctly, and rebuildSPC warns about the fact that NAPA and others have no data / NA depths
segment(sp4, 20:21)
#> Warning: Horizon top depths contain NA! Check depth logic with
#> aqp::checkHzDepthLogic()
#> Warning: Horizon bottom depths contain NA! Check depth logic with
#> aqp::checkHzDepthLogic()
#> SoilProfileCollection with 10 profiles and 10 horizons
#> profile ID: id | horizon ID: hzID
#> Depth range: 21 - 21 cm
#>
#> ----- Horizons (6 / 10 rows | 10 / 15 columns) -----
#> id hzID top bottom name K Mg Ca CEC_7 ex_Ca_to_Mg
#> colusa 1 20 21 Bt1 0.1 23.2 1.9 23.7 0.08
#> glenn 2 20 21 Bt 0.3 18.9 4.5 27.5 0.20
#> kings 3 20 21 Bt2 0.8 17.7 4.4 20.0 0.25
#> mariposa 4 20 21 Bt2 0.3 44.3 6.2 34.1 0.14
#> mendocino 5 20 21 Bt2 0.2 30.5 3.7 22.9 0.12
#> napa 8 NA NA <NA> NA NA NA NA NA
#> [... more horizons ...]
#>
#> ----- Sites (6 / 10 rows | 1 / 1 columns) -----
#> id
#> colusa
#> glenn
#> kings
#> mariposa
#> mendocino
#> napa
#> [... more sites ...]
#>
#> Spatial Data: [EMPTY]
par(mar=c(0,0,2,1))
plot(sp4, color="clay")
abline(h = 20, lty=2, lwd=2)
checkHzDepthLogic(sp4)
#> id valid depthLogic sameDepth missingDepth overlapOrGap
#> 1 colusa TRUE FALSE FALSE FALSE FALSE
#> 2 glenn TRUE FALSE FALSE FALSE FALSE
#> 3 kings TRUE FALSE FALSE FALSE FALSE
#> 4 mariposa TRUE FALSE FALSE FALSE FALSE
#> 5 mendocino TRUE FALSE FALSE FALSE FALSE
#> 6 napa TRUE FALSE FALSE FALSE FALSE
#> 7 san benito TRUE FALSE FALSE FALSE FALSE
#> 8 shasta TRUE FALSE FALSE FALSE FALSE
#> 9 shasta-trinity TRUE FALSE FALSE FALSE FALSE
#> 10 tehama TRUE FALSE FALSE FALSE FALSE
napa, san benito and tehama do not have data in 20 - 21cm segment
checkHzDepthLogic(segment(sp4, 20:21))
#> Warning: Horizon top depths contain NA! Check depth logic with
#> aqp::checkHzDepthLogic()
#> Warning: Horizon bottom depths contain NA! Check depth logic with
#> aqp::checkHzDepthLogic()
#> id valid depthLogic sameDepth missingDepth overlapOrGap
#> 1 colusa TRUE FALSE FALSE FALSE FALSE
#> 2 glenn TRUE FALSE FALSE FALSE FALSE
#> 3 kings TRUE FALSE FALSE FALSE FALSE
#> 4 mariposa TRUE FALSE FALSE FALSE FALSE
#> 5 mendocino TRUE FALSE FALSE FALSE FALSE
#> 6 napa FALSE FALSE FALSE TRUE FALSE
#> 7 san benito FALSE FALSE FALSE TRUE FALSE
#> 8 shasta TRUE FALSE FALSE FALSE FALSE
#> 9 shasta-trinity TRUE FALSE FALSE FALSE FALSE
#> 10 tehama FALSE FALSE FALSE TRUE FALSE
Updated example, was curious if the various horizon subsetting approaches give the same results. This is not an extensive test and includes none of the common "problem" soil profiles.
The issue name wasn't well specified--there are a lot of questions, updates, fixes, and unresolved issues such as "what should be returned when asking for a depth range without data?".
I'm not sure if glom(..., fill = TRUE)
is working as expected (dropped profiles?).
Anyway, I'm going to start a vignette that explores the various ways one might subset and aggregate soil profiles. Perhaps by outlining several use cases and examples, we can better help ourselves and others understand the various approaches.
library(aqp)
# example SPC from data.frame
data(sp4)
depths(sp4) <- id ~ top + bottom
hzdesgnname(sp4) <- 'name'
# test effect of potential sorting alpha vs. numeric
# all seem to work
# profile_id(sp4) <- as.character(1:length(sp4))
profile_id(sp4) <- sprintf("%0.2d", 1:length(sp4))
# profile_id(sp4) <- letters[1:length(sp4)]
testIt <- function(spc, top, bottom) {
# keep old ID
spc$.oldID <- profile_id(spc)
# various approaches
# dice() fills missing depth intervals with NA
# default when given a formula
.fm <- as.formula(sprintf("%s:%s ~ .", top, bottom - 1))
d <- dice(spc, fm = .fm)
# force filling missing depth intervals with NA
g <- glom(spc, z1 = top, z2 = bottom, fill = TRUE)
gt <- glom(spc, z1 = top, z2 = bottom, truncate = TRUE, fill = TRUE)
# single NA horizon, with NA depths
st <- hz_segment(spc, intervals = c(top, bottom))
s <- hz_segment(spc, intervals = c(top, bottom), trim = FALSE)
# normalize profile IDs
# so that all can be combined / viewed together in a single SPC
profile_id(d) <- sprintf("%s\nD", profile_id(d))
profile_id(g) <- sprintf("%s\nG", profile_id(g))
profile_id(gt) <- sprintf("%s\nGT", profile_id(gt))
profile_id(s) <- sprintf("%s\nS", profile_id(s))
profile_id(st) <- sprintf("%s\nST", profile_id(st))
profile_id(spc) <- sprintf("%s\n", profile_id(spc))
x <- combine(spc, d, g, gt, s, st)
par(mar = c(0, 0, 3, 0))
plotSPC(x, color = 'CEC_7', name.style = 'center-center', width = 0.4, id.style = 'top', col.palette = hcl.colors(25, palette = 'viridis'), depth.axis = list(line = -5))
segments(x0 = 0, x1 = length(x) + 1, y0 = c(top, bottom), y1 = c(top, bottom), lwd = 2, lty = 3, col = 'red')
invisible(x)
}
testIt(sp4, top = 0, bottom = 25)
# check for all-NA horizons, and equal number of sites / original ID
table(a$.oldID)
a <- testIt(sp4, top = 15, bottom = 35)
# check for all-NA horizons, and equal number of sites / original ID
table(a$.oldID)
a <- testIt(sp4, top = 20, bottom = 21)
# check for all-NA horizons, and equal number of sites / original ID
table(a$.oldID)
a <- testIt(sp4, top = 50, bottom = 60)
# check for all-NA horizons, and equal number of sites / original ID
table(a$.oldID)
I'm not sure if
glom(..., fill = TRUE)
is working as expected (dropped profiles?).
You are right, that is a bug in the case where no profiles have horizons in the specified interval, should be fixed in f25246d5 and added test in 3684b12
Anyway, I'm going to start a vignette that explores the various ways one might subset and aggregate soil profiles. Perhaps by outlining several use cases and examples, we can better help ourselves and others understand the various approaches.
Sounds good!
Cool thanks. Do we have a sample dataset with each of the typical hz logic errors? That might be really handy for tests.