bety icon indicating copy to clipboard operation
bety copied to clipboard

sites value constraints: needed cleanup and decisions

Open gsrohde opened this issue 9 years ago • 4 comments

city, state, country

  • [ ] Decide on constraints
    • issue #201 provides information on removing duplicates, and source of standardized names
  • [ ] Add them

soil

  • [x] Decide if there is a reasonable value constraint
    • all lowercase
    • one of clay, sandy clay, sandy clay loam, sandy loam, loamy sand, sand, clay loam, loam, silty clay, silty clay loam, silt loam, silt.
  • [ ] Correct non-conforming values (see notes below)
  • [ ] Add it.

som

  • [x] Decide what a reasonable range restriction is.
    • 0-100 [represents percent organic matter]
  • [ ] Add constraint.

greenhouse

  • [x] Decide if this can be null.
    • no. Should be true or false. Default is false (or zero)
  • [ ] Change existing NULLs.
  • [ ] If not, add constraint.

local_time

  • [ ] Decide how to change this column.
    • Defined as offset from GMT, integer range -12 to +12.
    • could be auto-populated. Otherwise, allow NULL

geometry

  • [x] Decide what, if any, constraints could be used
    • lat between -90, 90
    • lon between -180, 180
    • elevation between -100, 10,000
  • [ ] Add them

Details

city, state, country

See discussion at https://www.overleaf.com/2086241dwjyrd#/5297403/.

soil

Right now, there are 31 distinct descriptors, but many of these are the same if variations in capitalization and whitespace are ignored.

If we restrict this, anomalous information could go into the soilnotes column.

UPDATE: There are 7 distinct non-empty values that aren't in the list. Here they are along with possible updated values:

"loamsandy" --> sandy loam
"peat" --> ???
"sandloamy" --> sandy loam
"Nodaway silt loam" --> silty clay loam???
"loamclay" --> clay loam
"loamsilt" --> silt loam
"Sandy clay orthic A, Pedo calanic B, RSA: Swartland form, Swartland series, USDA: ALFISOL RHODOXERAL" --> ???

som

Obviously any percentage like this must be in [0, 100], but the maximal occurring value is only 3.50 right now, so a better upper bound might be reasonable.

greenhouse

There are 272 rows where this is NULL. We'd have to decide how to assign these if we make this column NOT NULL.

local_time

This is an integer, so we can't represent all time zones. See discussion at https://www.overleaf.com/2086241dwjyrd#/5297403/.

geometry

See discussion at https://www.overleaf.com/2086241dwjyrd#/5297403/. At the very least, some restriction on the altitude extracted from this might be possible. And there should be some consistency checks possible between this value and other columns such as local_time and country.

gsrohde avatar Feb 16 '15 22:02 gsrohde