Investigation: String IDs to foreign keys
Feature description and context
We're getting into the heavy standardization work. Having consistent foreign keys references will go a long way to standardizing calls against various objects. But it's going to take some digging to figure out everything that needs to be fixed
Feature sign-off requirements
List of objects that need to be standardized so tech debt tickets can be made for it
just a quick summary of the main classes of problems that i know of which could use improvement here.
- straight missing foreign key references. for example
submitterIDhas the value of Userusernamebut no fk at all. so data is often out of sync or incorrect here (and this is a critical field for permissions/ownership etc!) - duplicate references. Relationship.java has both
.individualN(MarkedIndividual) and.markedIndividualNameN(a string that is remarkably actually the id not name!) properties. (and the name/id one has no foreign key constraint so also falls under (1) - but should probably go away so that is a moot point.) - duplicate "bidirectional" fields. Encounter.java has
occurrenceID(a field) but alsogetOccurrence()which shockingly needs a Shepherd passed, despite the fact that Occurrence.encounters works as expected. (Compare to the one-to-many MarkedIndividual/Encounter mapping which works correctly.) Currentlyenc.occurrenceIDis expected to always be updated (manually in code) when an encounter is assigned to a new occurrence. - probably should be a table with fk. A little trickier (given how we treat this data), but still probably worth considering from a data integrity point of view. For example: taxonomy, location. Right now these data act like they have "real" ids but are not even present in the db to have a fk to reference! We absolutely 100% have invalid values where these are referenced, i.e. everywhere. (Historically we attempted a proper Taxonomy class with a real id; but it basically fell apart and is half implemented, if that.)
- string values that prob should be enums if not fk to other tables. probably least critical / lowest priority, but would be nice; there are tons of places where a small "fixed" (spoiler: its not that fixed) lists of strings are considered valid values - but can end up getting set to invalid values. some examples: Annotation.viewpoint, Annotation.iaClass, (various).sex, various statuses (MediaAsset.detectionStatus), Encounter.state, etc etc etc.
there will have to be different fixes for the different kind of problems. one of the biggest is that db data will have to be repaired first before any schema changes are made. for example, we cannot simply throw a fk reference on enc.submitterID as it is guaranteed to have bad data in it.
likely fixes to some of these might require: (a) manual data fix, (b) updating code, (c) updating package.jdo to add constraints, (d) sql to populate null fields [that will prevent constraint from being added], (e) possibly manually adding constraint when code is deployed (as sometimes it seems like datanucleus auto-schema magic refuses to add new constraints?)
in other words, its a mess and we need to tread lightly. 🙂 definitely recommend addressing each single case we want to fix one at a time. and practice on local copies of various wildbook data (as surely different dbs will surprise us and act in new mysterious ways from others).
@naknomum and @holmbergius: I'll find some time for us to review this and create and prioritize some sub-tickets.