woeip
woeip copied to clipboard
The system needs a way to indicate which data has been reviewed and found to be clean.
Description
One of the long-term use cases for the database is to store data that's been collected according to WOEIP's specific data collection requirements. For example, they may conduct a formal, grant-funded study in which certain collection requirements must be met to ensure clean data.
Per the most recent data flow diagram, any uploaded data will be ~moved from a staging table into the permanent database~ flushed from the front-end Form/Object and saved to the database when a user confirms their session details. However, this data may not be "clean" in the sense that it doesn't conform to WOEIP collection requirements. Therefore, WOEIP admin users need a way to review and approve any uploaded data.
Proposed solution
This approach has been discussed with both developers and WOEIP, and was met with general consensus at the time (approx. Jan 2020?).
-
Project Collaborator
user uploads their data, confirms their details, and saves the session (covered in 132). - System flushes data from front-end Object/Form, moving data into permanent table and flagging data as
unvalidated
. - System simultaneously emails
Org Host
users that data is available for review (this would be a post-MVP component). -
Org Host
can then sign in, view unvalidated data, clean it, and approve it for unflagging (all post-MVP). - On the front-end, default map views would omit unvalidated data, with a toggle option to include it in displayed maps (also post-MVP).
Acceptance criteria
- [ ] Proposed solution has been reviewed by team and finalized.
- [ ] Errors/edge cases have been adequately addressed and documented.
- [ ] All open questions have been resolved.
Dependencies and assumptions
- This issue only covers process #2 in Proposed Solution above.
Related documentation
- Roles & Permissions Grid (9/6/19)
- Account Requirements doc (in progress)
System flushes staging table, moving data into permanent table
- It seems we're converging on the "staging table" existing as a Form or Object in the front end. This will be flushed automatically when the data is sent to the backend.
and flagging data as
unvalidated
- I think the flag should exist in the
Collection
model as a field labeledvalidated_at
. It will benull
for unvalidated data and contain adatetime
when it is validated.
I think the flag should exist in the
Collection
model as a field labeledvalidated_at
. It will be null for unvalidated data and contain a datetime when it is validated.
It's not the Collection
being validated, though. The validation is that the TimeGeo
and PollutantValue
pairs are accurate. So I'm not sure which table the field should live in. Maybe CollectionFile
? The end use case is when the user is viewing a map and needs to view "only validated data," "only unvalidated data," or "all data."
So my two questions are specifically:
- Which table should the validation field live in based on the details above? (I'm not convinced
Collection
is the right one but of course may be wrong about this.) - Will a null
validated_at
field cause issues when querying for either of the three states described in the use case above?
- My concern is that there could be hundreds or thousands of
PollutionValues
perCollection
- SQL Where IS (NOT) NULL should do the trick for all our needs.
-
An understandable concern. So if the Org Host user validates per
Collection
, then how does the system query, say, only validated data ranging across a series of Collections? In other words, the Collection concept is finite (a short span of data from one session), likely only to be viewed right after a Project Contributor uploads their session data. More often, users will be viewing a map that displays all data from all (validated) collections at once. My concern (which may be misplaced) is that the common use case will then require the system to first query Collections for validated records, then call the associated CollectionFile records, and then pull the TimeGeo records. I may be totally misunderstanding the system and sequence of operations, though. Let me know if it makes sense to table this until Tues for a live conversation. -
Word. Is that syntax (with the parentheses) how you indicate both NULL and NOT NULL records? #noob
- Yeah, that's basically how it works. Fortunately, SQL is designed to perform those operations quickly and in one JOIN command.
- The parentheses indicate two separate commands- one using
NULL
and another usingNOT NULL
. If you wanted all the values, you would omit the filter altogether. The back end chooses which filter to apply based on the desired operation and authorization level of the user.