li icon indicating copy to clipboard operation
li copied to clipboard

Add populations to Wikidata

Open hyperknot opened this issue 4 years ago • 28 comments

Is there anyone who has the right to edit Wikidata articles for counties? Basically it means 50+ edits on Wikidata which means the account is "autoconfirmed".

Right now I have that level, but it's quite tedious to fix all populations alone and I'd be happy if someone could help me.

Some of the locations are less important and everyone can edit them, like these ones in Panama: https://www.wikidata.org/wiki/Q217138

Other ones are in the "top 3000" items and only people with confirmed accounts can edit them. But basically editing the less important features would allow someone to get to this autoconfirmed level.

So who would like to help by entering population informations?

Need to add missing in:

  • [x] Slovenia
  • [x] Ireland
  • [x] Poland
  • [x] Lithuania
  • [ ] South Korea
  • [x] Panama
  • [ ] Sebastopol, Russia

hyperknot avatar Apr 13 '20 00:04 hyperknot

Populations on Wikipedia: https://en.wikipedia.org/wiki/Provinces_of_Panama#Provinces

Alternative source from @ciscorucinski which has it at county level: https://www.citypopulation.de/en/panama/admin/

shaperilio avatar Apr 13 '20 00:04 shaperilio

Yes, sometimes I'm using Wikipedia as a source, but still it needs to be added manually. Citypopulation.de doesn't allow downloading the map data. Also we prefer official data with matching government dataset.

hyperknot avatar Apr 13 '20 01:04 hyperknot

Here's population information down to corregimientos level (the granularity at which we get COVID data): https://github.com/EricLuceroGonzalez/Panama-Political-Division Population is presumably from the 2010 census, but it would have to be verified.

Which raises the question: how are we validating any of this?

shaperilio avatar Apr 13 '20 01:04 shaperilio

It seems like an intimidating interface with wikidata for adding information

ciscorucinski avatar Apr 13 '20 01:04 ciscorucinski

@ciscorucinski you mean how to add population data?

Basically:

  1. click "add statement" at the bottom
  2. select populations
  3. enter the value
  4. add qualifier
  5. select point in time
  6. enter year
  7. add reference
  8. select URL or type P4656 for wikipedia import
  9. paste URL
  10. save

If you do this in multiple steps, it's quite easy to get over the 50 required edits to get your account autoconfirmed. For example add - publish - add qualifier - publish - add reference - publish can get you 4 edits. So with 13 regions you are over 50 edits :-)

hyperknot avatar Apr 13 '20 01:04 hyperknot

Done for Panama's provinces.

shaperilio avatar Apr 13 '20 02:04 shaperilio

There are ways of doing this via Google Sheets and a tool called QuickStatements. Since we are only concerned with one type of data import process, we should be able to create a fairly standardized process within a spreadsheet.

Google Sheets + QuickStatements: https://www.youtube.com/watch?v=bUpJN4IklJ8 OpenRefine: https://www.youtube.com/watch?v=wfS1qTKFQoI

ciscorucinski avatar Apr 13 '20 06:04 ciscorucinski

@ciscorucinski if you can mass import using this tool it'd be great! So far I've done all my edits by hand.

hyperknot avatar Apr 13 '20 12:04 hyperknot

@hyperknot you can! But I am uncertain how to go about doing it for this data right now

ciscorucinski avatar Apr 13 '20 12:04 ciscorucinski

Luckily we don't have that many missing populations. If we encounter an other country with a lot, I'll comment here.

hyperknot avatar Apr 13 '20 12:04 hyperknot

is there an easy way to find what is missing?

ciscorucinski avatar Apr 13 '20 12:04 ciscorucinski

Ones without population in this JSON: https://raw.githubusercontent.com/hyperknot/country-levels-export/master/iso2.json

hyperknot avatar Apr 13 '20 12:04 hyperknot

Portugal seems like a good candidate: https://github.com/hyperknot/country-levels-export/blob/master/docs/iso2_list/PT.md

hyperknot avatar Apr 13 '20 12:04 hyperknot

We need to add: Slovenia, Ireland, Poland, and Lithuania.

hyperknot avatar Apr 14 '20 09:04 hyperknot

I fixed Ireland and Poland. What is missing in Lithuania?

For Slovenia, it really needs that batch updating effort! @ciscorucinski can you help with that?

hyperknot avatar Apr 14 '20 09:04 hyperknot

Let's create a Google Sheet, and try out a few records before mass editing. I have never edited a wikidata entry, so consider me a noob here 😅

What info is needed to identify a population point in terms of wikidata? We need Q IDs for a few datapoints, but these can be retrieved through a wikidata Chrome extension in Google Sheets.

Just datapoint names such as Country, State, and county level names should be good enough I guess??? Along with the population data and url reference

ciscorucinski avatar Apr 14 '20 10:04 ciscorucinski

All the Q-s we need are here: https://github.com/hyperknot/country-levels-export/blob/master/docs/iso2_list/SI.md

Machine readable format is this: https://raw.githubusercontent.com/hyperknot/country-levels-export/master/iso2.json

The other side of the equation should be some government census CSV listing those populations in a CSV.

hyperknot avatar Apr 14 '20 10:04 hyperknot

Really not ideal (has some weird character errors) but here is a CSV from the Slovenian Statistical Bureau. Data is from 2019. https://gist.github.com/qgolsteyn/145d82f984d65c34e778371a69cf5433

qgolsteyn avatar Apr 14 '20 19:04 qgolsteyn

@qgolsteyn thanks! Do you have the source for this file? Maybe chardetect would tell us what encoding it's in.

hyperknot avatar Apr 14 '20 20:04 hyperknot

I don't have it immediately, but will get the source to you by this evening. I also update the list with additional countries that need population info

qgolsteyn avatar Apr 15 '20 00:04 qgolsteyn

Thanks!

hyperknot avatar Apr 15 '20 22:04 hyperknot

My appologies, here is Slovenia's data: https://pxweb.stat.si/SiStatDb/pxweb/en/10_Dem_soc/10_Dem_soc__05_prebivalstvo__10_stevilo_preb__20_05C40_prebivalstvo_obcine/05C4002S.px/table/tableViewLayout2/

qgolsteyn avatar Apr 16 '20 07:04 qgolsteyn

Portugal is done, as is Colombia. Working on Slovenia next.

shaperilio avatar Apr 18 '20 21:04 shaperilio

I think Slovenia is done, but I got "errors" on their tool, despite there being hundreds of successful edits....

EDIT

Because I tried to add the atomic number of a municipality among other atrocities 😆 Anyway, it's processing now, should be done soon.

shaperilio avatar Apr 18 '20 22:04 shaperilio

Lithuania should be done...after much struggle. I'm off for the rest of the night.

shaperilio avatar Apr 19 '20 00:04 shaperilio

@shaperilio thanks so much, I've updated the file already but I'll make a new processing for Lithuania as well.

hyperknot avatar Apr 20 '20 13:04 hyperknot

Korea should be up to date now

ciscorucinski avatar Apr 24 '20 12:04 ciscorucinski

@hyperknot , is this issue still open? Wondering what the current status is. Cheers, z

jzohrab avatar Aug 09 '20 14:08 jzohrab