aggregate icon indicating copy to clipboard operation
aggregate copied to clipboard

Users should be able to fix data corruption with Aggregate

Open ggalmazor opened this issue 7 years ago • 9 comments

Aggregate should have a new admin section with information about data corruption in the database and automatic fixes.

The starting point for the information and fixes would be the current guide for fixing data corruption.

Note: Continually deploying this feature at https://aggregatetest-184307.appspot.com

Corruption by dupes/missing rows in tables

(note: the compilation of cases in this table is a work in progress)

cause effect fix
missing _form_info_fileset UI refresh hell delete form
dupe _form_info_fileset anoying alerts delete the dupe
nulls in _form_info_fileset UI refresh hell fill nulls
missing _form_info_submission_association unkown delete form
dupe _form_info_submission_association anoying alerts delete the dupe
nulls in _form_info_submission_association unkown fill nulls
no _form_data_model for a given _form_info_submission_association unknown delete form

Corruption by wrong/incomplete data

cause effect fix
Wrong ordinal_number sequences in repeat tables Users can't list and export submissions See #134
null values at some boolean fields like _form_info_fileset.IS_DOWNLOAD_ALLOWED UI refresh hell Fix field

Corruption by orphaned rows in tables

There can be orphaned rows in many tables such as:

  • _form_data_model
  • _form_info_fileset
  • _form_info_manifest_bin
  • _form_info_manifest_blb
  • _form_info_manifest_ref
  • _form_info_submission_association
  • _form_info_xform_bin
  • _form_info_xform_blb
  • _form_info_xform_ref
  • And any table that holds submissions

Questions:

  • How are _form_info_manifest_blb and _form_info_manifest_ref tables related to _form_info_manifest_bin (same question with _form_info_xform_xyz)

ggalmazor avatar Nov 11 '17 09:11 ggalmazor

Replaces #79. @ggalmazor it may be worth taking a quick look at some of the warnings/concerns brought up on that issue discussion. They may not all be current so take them with a grain of salt!

lognaturel avatar Nov 13 '17 16:11 lognaturel

Ok, I'll take a look. (I've added a note in the issue description to warn readers that the table with the causes will change)

ggalmazor avatar Nov 13 '17 16:11 ggalmazor

I'm thinking about something like this:

(don't mind the "Forms without fileset grey header above the table")

In the screenshot:

  • The third form has no submission associations, therefore, the delete form button is available for that row.
  • The fourth form has 2 submission association rows, which is indicative of data corruption. You get a button that will show those rows in a popup and you will be able to choose which one you want to leave.

When a form must be deleted, the process would ensure that no orphan rows are left in other tables like _form_info_fileset and other like it.

I'm thinking to follow the same approach to detect and delete orphaned rows in fileset, submission association, submission, etc. tables as well.

ggalmazor avatar Nov 14 '17 20:11 ggalmazor

This is a really important problem to take on!

The general approach of checking _form_* for missing or duplicate rows is fantastic, but I'd be consistent there. For example, you are not checking for _form_info_manifest_bin and it's not clear why. Is it impossible?

I also like the approach of cleaning up orphaned rows, but to me, that's a nice to have. If it ends up being a big lift, I'd delay shipping it.

I don't know if users need to know exactly why a form is broken. They just need to know that it is broken and if they can push a button to fix it. And I'd frame it this way because it's our fault that their data is corrupt. We don't want them to do extra work.

So to that end, I'd probably only list the forms that have problems. And I'd have two headers: Form ID, Problem, and then a Fix button. When you tap the Fix button, it lists the things it'll try to do, and then you can confirm.

And for things that you can't fix (e.g., a missing repeat), you maybe list the submission that has the problem and we can decide what to do later.

Oh, and I wouldn't call it Data Corruption Admin because it definitely sounds like we are giving up. Maybe we put it under Site Admin under Database Repair

yanokwa avatar Nov 15 '17 03:11 yanokwa

Alright! Thanks for the feedback :)

Knowing that we're working in the right direction I can continue adding to this new section. I think I'll be deploying to AppEngine regularly to demo it. Also, DataStore lets us be dirtier with data and reproduce more corruption scenarios :)

I have some questions/comments:

  • I've added a pair of columns for _form_info_manifest_bin and _form_info_xform_bin which count the number of rows in those tables, but I don't really know yet what they do or how corruption there would be expressed. Could you help me with a form that writes on those or some way to reproduce corruption there?
  • I agree that, in the end, we're aiming for a "Fix this" button", but I'd like to leave a detailed table there for some time while users start using this tool for two reasons:
    • It gives us information about specific corruption situations while helping users without having them send us their databases. We can ask them to post a screenshot in the forum, for example. I'm assuming that we will get messages like "my Aggregate is still broken and the database repair won't say anything"
    • It lets me work with simpler fixes because each button will do just one fix. I haven't figured out yet how the queueing backend process works and working this way would let us deliver sooner.
  • In the particular case of missing rows in a repeat table, couldn't we just rewrite the sequence to fill the gaps?

ggalmazor avatar Nov 15 '17 07:11 ggalmazor

Quick update on today's work. I'm modeling the different ways we could gather data corruption sources, their explanation, and possible fixes.

data tool (just notice the different values in the filesets column and the last column)

A subdomain of a form (filesets, submission associations, etc.) can have different kinds of corruption. Sometimes there won't be a way to fix this situation and the form would have to be deleted. Each kind of corruption requires different fixes: removing a dupe row, filling some null field with a non-null value, etc.

On the next iteration, I want to focus on having a basic fixing tool for filesets. It will require manual user interaction at first. After that, we can add suggestions that would be applied automatically over time once we feel confident with them.

ggalmazor avatar Nov 16 '17 21:11 ggalmazor

Quick update on part of today's work. I've implemented the fixes for corruption issues on _form_info_fileset and a rudimentary UI.

snapshot1

In this case, there are nulls in required fields and some dupes. The Save button will only be enabled after the user marks enough rows for deletion until just one row would be left and that row must be valid (no nulls allowed).

snapshot2

After working out this, the user can hit Save and confirm that she wants to execute the fixes on the database.

snapshot3

ggalmazor avatar Nov 21 '17 19:11 ggalmazor

Today's report: extended the functionality for _form_info_submission_association. In this case, we've got 2 booleans (line in _form_info_fileset) but we also cover not having any rows in _form_data_model for a submission association row, which would be a sign of corruption.

The current implementation has received some love already and can be modified to match and handle new corruption causes... My instinct is that adding more cases doesn't help us triangulate further. Therefore, I'm going to focus next on solving corruption on submissions.

~Feedback on #134 and #141 would be much appreciated ;)~ << You already reviewed the issues, sorry!

ggalmazor avatar Nov 22 '17 20:11 ggalmazor

I copy here a comment from @lognaturel in #141:

In the docs we should specify that the latest duplicate is kept (though it generally won't matter because they'll have the same value).

We need to explain this to the user before applying a fix on this in the Database Repair tool.

ggalmazor avatar Nov 22 '17 21:11 ggalmazor