cht-core
cht-core copied to clipboard
Work out what we want to do with Enketo XML, and do that thing
When creating reports we store the user-entered data effectively twice:
- Once as the raw XML that Enketo returns to us, for use by subsequent Enketo invocations (i.e editing docs)
- Once as extract JSON in
doc.fields.{}
, for use by everything else (on-screen display, configurable logic like rules and schedules, impact and analytics via couch2pg, etc)
Originally we stored the XML in the doc.xml
property, as a string.
We later realised that this is inefficient / unhelpful, because CouchDB's view generation performance is a function of the size of a JSON document, and nothing refers to it except editing. So, we moved it into an attachment.
However, this has caused other performance problems:
- Replication is less efficient in general, because the algorithm requests attachments separately
- Replication is far less efficient relevant to CouchDB 2.0, because we can now batch document GETs, but not attachments GETs, so the disparity is far more obvious
- For a reason yet to be investigated, PouchDB view generation is far slower with attachments present (it should be at worst identical).
We need to work out what to do here.
Obvious options include:
- Prioritise replication speed and PouchDB view generation performance over CouchDB view generation performance and move the XML back into a string against the JSON
- Delete the XMl altogether, and re-generate the XML from the
doc.fields
object when we want to edit (hat-tip @dianabarsan for that brilliant suggestion) - Do any amount of PouchDB investigation and performance improvements to solve any or all of the problems listed above
Diana's suggestion sounds ideal, and IMO should be investigated first, but it can be up for discussion.
Context: https://docs.google.com/document/d/1kXjsF-orndz0-V5UFf_Esx5J-ogeQ6_tALCwE1K0tvA/edit#heading=h.29bkt8nfi7kw
Scheduling for 4.0.0 because that's the ideal time to do any data migrations that may be necessary for this.
@dianabarsan assures me that we can reverse engineer from JSON to XML so we should just do that.
Is it possible to split this issue into two?
- Convert JSON to XML and stop attaching the XML to new reports.
- Perform the migration to remove the attachment for all legacy reports?
If I understand the approach correctly, this would allow us to discuss scheduling the first issue in a 3.x timeframe and would be beneficial to new projects (and old).
Yes, I should have updated this issue so it now only covers step 1. We've decided not to run any major migrations in 4.0.0 so step 2 will have to wait until 5.0.0, and/or be provided as a custom script that each project can choose to run, or not.
So if we are just converting JSON to XML and not attaching the XML to new reports, is this regarded as a breaking change?
It depends on the implementation, because we may need to store more information to completely reproduce the xml, but in all likelihood it's not a breaking change.
This is ready for AT on 5549-no-report-xml-attachment
. The branch is built on top of the Enketo uplift branch.
There should be no difference for the user when filling or editing forms. You should be able to:
- Edit a report, generated previous to this change (and that has a content attachment), and have all form fields filled. The attachment should be removed when saving.
- Create a new report. The new report should not have an attachment.
- Edit a report, generated after this change, and have all fields filled.
There are a few caveats:
- If the form itself is changed, the new/edited fields won't get populated. This behavior existed before, but should be tested for consistency.
- https://github.com/medic/cht-core/issues/7594 - this behavior existed before
- https://github.com/medic/cht-core/issues/7605 - the additional context part Reports created as extra docs might not always be faithful to the model of the form itself.
Curious - after getting this fix via an upgrade to 4.0, would projects be able to remove attachments from the reports created via 3.x versions without breaking anything?
Mixed. Yes for the most part, with one exception.
Some changes have been required: db-doc fields will be saved in the main report fields. To maintain old report content display, these fields will become "hidden". Because of (1), it's not safe to default to using fields when editing a report that has a content attachment, so if the content attachment exists, it will be used to populate the form, but it will be deleted upon saving.
Quote from PR description: https://github.com/medic/cht-core/pull/7596
So, for reports that created other docs (extra docs), if the attachment was removed, the information about the "extra docs" is lost (the docs themselves exist in the database already). Editing those reports is actually already broken (one of the caveats).
Deleting attachments from existent reports and saving them, if partners decide to do this, should be one of the test scenarios!
- Edit a report, generated previous to this change (and that has a content attachment), and have all form fields filled. The attachment should be removed when saving.
Before :
After:
-
Create a new report. The new report should not have an attachment.
-
Edit a report, generated after this change, and have all fields filled.
Merged to master
.