test-lists icon indicating copy to clipboard operation
test-lists copied to clipboard

Extra consistency checks on category codes and URLs

Open hellais opened this issue 7 years ago • 3 comments

We have run into the issue, when using the lists in OONI, that some country lists present the following problems:

  1. The same URL present in different country specific lists, presents a different category code

ex. id.csv:http://denypagetests.netsweeper.com,NEWS,News Media,2014-04-15,citizenlab, kw.csv:http://denypagetests.netsweeper.com,CTRL,Control content,2014-04-15,citizenlab,

  1. The same URL is present in both the global and the country specific list

ex. global.csv:http://www.crazyshit.com,PORN,Pornography,2014-04-15,citizenlab,Updated by OONI on 2017-02-14 sg.csv:http://www.crazyshit.com,NEWS,News Media,2014-04-15,citizenlab,

We should add checks to the lint-lists.py script that checks if:

  1. There are inconsistencies in category codes across lists
  2. If a URL is present in the global list it should not also be present in the country specific list

On this second point I would like to hear from @sneft and others to know if this is reasonable or if it's maybe just a OONI specific usage of the lists.

hellais avatar Oct 11 '18 10:10 hellais

Check №2 is a bit tricky for cis.csv. Should cis.csv be treated in a same way as global.csv for corresponding countries? What definition of CIS should it use? E.g. should it include Georgia?

darkk avatar Oct 11 '18 11:10 darkk

At OONI we don't actually use cis.csv at all and that country list has not been updated in a pretty long while. I would go to the extent of suggesting we remove it or move it to another directory.

hellais avatar Oct 11 '18 11:10 hellais

For point 1, I have no doubt there are a number of these inconsistencies. We tried to fix these as we encountered them but haven't ever made a systematic effort to clean them.

For point 2, I agree that if a URL is present on the global list it should not be on a local list. Our old testing system flagged when you attempted to upload a local list with a URL duplicated in the global list. Our logic was that the global and local lists are meant to be run as a single unit, so we wanted to avoid duplication. I know we had some cases where this was inconvenient (e.g. wanting to test a very narrow sample in a bandwidth-limited place) but to my knowledge OONI is flexible enough to better accommodate custom lists for special circumstances.

(This requirement does add a small burden of labour on list compilers, as in my experience the average person compiling a local list will reasonably (and often appropriately) add certain URLs that are duplicated in the global list. Perhaps this is just a matter of good documentation and instructions to list compilers.)

sneft avatar Oct 11 '18 19:10 sneft