Nudge users towards removing unverifiable information
CH tries to incentivize the user to fix {{cn}} templates by adding a reference, but we could be doing a better job of explaining that it is OK to be bold and remove information that has remained unverified for a while.
The {{cn}} template has a date parameter that allows us to tell how long it's been there. One idea here would be to change the CH's interface to display some notice when a template has been there for (say) over a year.
The natural way to surface this in the UI would be to introduce a warning icon much like the yellow one we use for lead section snippets:
I have no idea whether users even notice that icon though, so a first step would be to measure that if we want to go in this direction.
Through this feature, are we giving user the permission to delete a snippet, or warning them that it will be deleted if not verified soon?
Well, it's not "permission" because we don't control who edits Wikipedia, anyone can go and remove the snippet there, regardless of Citation Hunt. But we can't warn users that the snippet "will be deleted" either, because we can't guarantee that: there are definitely snippets that exist for years without being deleted. See here and here for some more information about when/how it could get deleted by someone.
That said, if a snippet has not been verified in N months and the user has failed to do it now, we could remind them that yes, it is fine to delete the snippet instead of just skipping to the next one. That's the idea in this issue; we would want to do something like:
- Figure out what snippets we want to display a notice on. I think snippets older than 3 months is a good start.
- Figure out what that notice will look like in the UI. In my comment above, I noted that there is already a warning button in the UI sometimes, and we could maybe use a similar button.
- Write code to:
- Identify the targeted snippets and tag them in our database.
- Display the notice when serving one of those snippets.
- Get some statistics on whether it's all working. Are people noticing/clicking the notice? Are more snippets being removed now that we display the notice?
As a future improvement, we could even allow users to browse only through those old snippets, but that should likely be a separate feature.
Does that make sense? If you're interested in giving this a try, I can definitely go into more detail into any of that and answer questions!
Yes! I'm interested in giving this a try. I thought this over and have a few cents to add:
- In my opinion, another idea about users browsing through old snippets could probably involve sorting them by decreasing order of contentiousness (on, say, claims on living people). More controversial and unverified content can be purged this way. Hence, we can make the process easier and cleaner.
- What about unverified snippets that are reported as spam? We could maybe do something similar to 1.
Also, could you please explain to me the utility of doing this, as opposed to just letting old unverified snippets be (like they are right now)?
In my opinion, another idea about users browsing through old snippets could probably involve sorting them by decreasing order of contentiousness (on, say, claims on living people). More controversial and unverified content can be purged this way. Hence, we can make the process easier and cleaner.
Yep, that's an interesting idea. When browsing snippets within a category, I think currently they are sorted by article title, but we can totally sort by snippet age, or (title, age), once we have enough data in our database to do that.
By the way, related to your specific example, it is currently possible to select the category Living people in the UI to browse only snippets in there. Also, with the Customize feature, it is possible to browse snippets like "Living people from Colombia" or something. All of that could probably be easier to discover and have better UX, of course.
What about unverified snippets that are reported as spam? We could maybe do something similar to 1.
Hm, what does "reported as spam" mean? Vandalism control is outside of CH's scope, and typically other tools like Huddle are used in the community.
Also, could you please explain to me the utility of doing this, as opposed to just letting old unverified snippets be (like they are right now)?
I think the assumption here is that relatively new editors (which compose a large fraction of CH users, especially during #1lib1ref) tend to be uncomfortable with removing content, even when totally entitled to do it.
Based on @Samwalton9 and @Sadads' experiences running editor events, it seems either they find a reference and make an edit, or they skip to the next snippet -- and we want to remind them that yes, it is OK to remove content too.
Of course, it would be wonderful if we could actually quantify the impact of making this change. For example, if we could detect that a snippet has been removed (there's already something close to that), we could see if it becomes more frequent after we start displaying our reminder.
Okay! Sounds great. Are there any more details I should know before we start implementing it? And can you please guide me about the implementation?
I think the minimum implementation could be something like:
-
Change the snippet_parser to extract the
dateparameter in the Citation Needed template. Since each snippet has at least one{{ cn }}, this is the age of the snippet itself.Of course, we have different templates for different languages, and the
datewill have a different name (or not be present at all) in some languages. For example, I think in Spanish, the corresponding template does not have a date parameter.So we probably want the name of the parameter to use to be configurable (in config.py), and make sure we do something sensible all throughout when there is no date to be extracted.
-
Change the database schema (in chdb.py) to accommodate the snippet age. Change scripts/parse_live.py to write that age, which it will get from snippet_parser.
-
Change handlers/database.py to return the snippet age when querying the database, and handlers/citationhunt.py to pass the age (or better yet, some boolean that means "show/hide the notice") to the UI. The age threshold (e.g., 3 months) should be configured in config.py.
-
Change templates/index.html to handle actually implement the notice. I think you can mostly copy what we have for the lead section notice, which in turn is basically using this library to show the dialog.
Then we can think about the other ideas we discussed. I suggest maybe doing one commit per item in the above list (so it's easier to review), and feel free to ask questions or send almost-complete PRs -- it's usually easier to discuss code when there's actual code to look at.
Hey! I couldn't quite understand the first part. I tried a lot, and from what I understood I guess the tpl has the template but it is just denoted by {{cn}}. I can't see the date attribute in this. If the template looks like the link https://en.wikipedia.org/wiki/Template:Citation_needed then I probably need to extract the date using string operations. But please help me here..
I think not all {{ cn }} on Wikipedia will have a date, necessarily, but many definitely do, so I think we can worry about that later. For a random example, the [citation needed] tag at the end of https://en.wikipedia.org/wiki/Clare_College,_Cambridge#Clareification has a date, and you could use that page for testing. Remember you can use the scripts/parse_page.py script to run a single page through the snippet_parser, for quick testing.
We actually use a library called mwparserfromhell (https://github.com/earwig/mwparserfromhell) to make it easier to handle Wikicode, so I'd suggest that you familiarize yourself with it by reading the docs and trying out a couple of examples. Basically inside snippet_parser, specifically in this block of code https://github.com/eggpi/citationhunt/blob/master/snippet_parser/core.py#L167-L178, tpl should be a Template object, and you should be able to call its get method https://mwparserfromhell.readthedocs.io/en/latest/api/mwparserfromhell.nodes.html#mwparserfromhell.nodes.template.Template.get to get the date, where it's available.
Does that make sense?
I think I have this working: in my local dev environment, snippets that are older than 4 years on English Wikipedia now get a new blue notice button that displays the following message when clicked:
Next steps:
- Wait until #1Lib1Ref is over, I wouldn't want to push this now and risk breaking CH.
- Maybe add a bit of tracking to see if anyone is actually clicking those notice buttons.
- Deploy the backend changes (new DB schema, pull dates out of templates).
- Wait until the database has been rebuilt for all languages.
- Deploy the frontend changes.
This is now live for enwiki: https://citationhunt.toolforge.org/en?id=c0261a01.
Leaving this open until I figure out if/how to do this across all languages.