Empirical-Core
Empirical-Core copied to clipboard
script to deduplicate question uids
WHAT
Add a script that:
- identifies pairs of questions with (case-insensitive) identical uids in the LMS database
- identifies, for each pair, which question has fewer responses in the CMS
- create duplicates of all responses for the identified questions in the CMS with a new uid
- creates duplicates of the questions themselves in the LMS, with the new uid we established in the CMS
- replaces the old question key with the new question key for any activities in the LMS
- archives the old questions
WHY
This issue was causing a weird experience for curriculum team members, who were seeing responses for two different questions on the afflicted questions in Grammar, and ultimately led to some issues where students got stuck because a curriculum team member thought they were just cleaning up old data and deleted optimal answers.
HOW
More or less followed the steps in the RFC, with a couple of deviations due to the fact that I didn't realize just how many responses would be impacted if we updated both questions in the duplicated pair (almost 6 million). Instead, I opted to replace only the question and respective responses that had fewer responses in the database -- some of these are still very large but in total this script will create just over 1 million responses. I also decided to create duplicate responses, rather than just update the existing ones, in the interests of being able to go back to using the old ones more easily if something were to go wrong. That does mean that we'll want to go back and delete the old responses at some point after this script has been run, though, because otherwise for the half of the questions we aren't replacing, curriculum team members would still see irrelevant responses (though nothing bad would happen if they deleted them). By creating these duplicates instead of updating the old ones, we also don't need to put all the questions in alpha before we start this process, so we no longer run the risk of interrupting the student experience.
Most of my time working on this was spent trying to figure out how to make the response creation not take a million years. insert_all
is very useful in this case!
Screenshots
(If applicable. Also, please censor any sensitive data)
Notion Card Links
https://www.notion.so/quill/Deduplicate-Question-UIDs-in-Grammar-aa23d0b06e15447e9c77abd6ede5f71e?pvs=4
What have you done to QA this feature?
Run the script on my local database
PR Checklist | Your Answer |
---|---|
Have you added and/or updated tests? | N/A |
Have you deployed to Staging? | NO - saving staging testing for after this has passed code review because it's annoying, though possible, to reverse |
Self-Review: Have you done an initial self-review of the code below on Github? | YES |