woeip icon indicating copy to clipboard operation
woeip copied to clipboard

The system needs a way to indicate which data has been reviewed and found to be clean.

Open theecrit opened this issue 4 years ago • 5 comments

Description

One of the long-term use cases for the database is to store data that's been collected according to WOEIP's specific data collection requirements. For example, they may conduct a formal, grant-funded study in which certain collection requirements must be met to ensure clean data.

Per the most recent data flow diagram, any uploaded data will be ~moved from a staging table into the permanent database~ flushed from the front-end Form/Object and saved to the database when a user confirms their session details. However, this data may not be "clean" in the sense that it doesn't conform to WOEIP collection requirements. Therefore, WOEIP admin users need a way to review and approve any uploaded data.

Proposed solution

This approach has been discussed with both developers and WOEIP, and was met with general consensus at the time (approx. Jan 2020?).

  1. Project Collaborator user uploads their data, confirms their details, and saves the session (covered in 132).
  2. System flushes data from front-end Object/Form, moving data into permanent table and flagging data as unvalidated.
  3. System simultaneously emails Org Host users that data is available for review (this would be a post-MVP component).
  4. Org Host can then sign in, view unvalidated data, clean it, and approve it for unflagging (all post-MVP).
  5. On the front-end, default map views would omit unvalidated data, with a toggle option to include it in displayed maps (also post-MVP).

Acceptance criteria

  • [ ] Proposed solution has been reviewed by team and finalized.
  • [ ] Errors/edge cases have been adequately addressed and documented.
  • [ ] All open questions have been resolved.

Dependencies and assumptions

- This issue only covers process #2 in Proposed Solution above.

Related documentation

theecrit avatar Apr 30 '20 18:04 theecrit

System flushes staging table, moving data into permanent table

  • It seems we're converging on the "staging table" existing as a Form or Object in the front end. This will be flushed automatically when the data is sent to the backend.

and flagging data as unvalidated

  • I think the flag should exist in the Collection model as a field labeled validated_at. It will be null for unvalidated data and contain a datetime when it is validated.

TangoYankee avatar Apr 30 '20 21:04 TangoYankee

I think the flag should exist in the Collection model as a field labeled validated_at. It will be null for unvalidated data and contain a datetime when it is validated.

It's not the Collection being validated, though. The validation is that the TimeGeo and PollutantValue pairs are accurate. So I'm not sure which table the field should live in. Maybe CollectionFile? The end use case is when the user is viewing a map and needs to view "only validated data," "only unvalidated data," or "all data."

So my two questions are specifically:

  1. Which table should the validation field live in based on the details above? (I'm not convinced Collection is the right one but of course may be wrong about this.)
  2. Will a null validated_at field cause issues when querying for either of the three states described in the use case above?

theecrit avatar Apr 30 '20 22:04 theecrit

  1. My concern is that there could be hundreds or thousands of PollutionValues per Collection
  2. SQL Where IS (NOT) NULL should do the trick for all our needs.

TangoYankee avatar Apr 30 '20 22:04 TangoYankee

  1. An understandable concern. So if the Org Host user validates per Collection, then how does the system query, say, only validated data ranging across a series of Collections? In other words, the Collection concept is finite (a short span of data from one session), likely only to be viewed right after a Project Contributor uploads their session data. More often, users will be viewing a map that displays all data from all (validated) collections at once. My concern (which may be misplaced) is that the common use case will then require the system to first query Collections for validated records, then call the associated CollectionFile records, and then pull the TimeGeo records. I may be totally misunderstanding the system and sequence of operations, though. Let me know if it makes sense to table this until Tues for a live conversation.

  2. Word. Is that syntax (with the parentheses) how you indicate both NULL and NOT NULL records? #noob

theecrit avatar Apr 30 '20 23:04 theecrit

  1. Yeah, that's basically how it works. Fortunately, SQL is designed to perform those operations quickly and in one JOIN command.
  2. The parentheses indicate two separate commands- one using NULL and another using NOT NULL. If you wanted all the values, you would omit the filter altogether. The back end chooses which filter to apply based on the desired operation and authorization level of the user.

TangoYankee avatar May 01 '20 00:05 TangoYankee