awesome-data icon indicating copy to clipboard operation
awesome-data copied to clipboard

country-geotime dataset, for geospatial and time information related to country

Open ppKrauss opened this issue 8 years ago • 20 comments

A datasets/country-geotime to complement datasets/country-codes, with geospatial country information and time references. As "join" with country-codes the most popular "country ID" (ISO3166-1-Alpha-2 and ISO3166-1-numeric) can be adopted. The other relevant columns already have rationale and specifications,


PS: territory_language and other standards are using ISO3166-1-Alpha-2 in "list of countries" attributes, see tr35/Supplemental_Language_Data.

ppKrauss avatar Sep 22 '15 16:09 ppKrauss

@ppKrauss seems very sensible. Do you want to do a first pass on a data package for this?

rufuspollock avatar Sep 23 '15 14:09 rufuspollock

@rgrp Thanks and ok (!), I am preparing, and may be sent weekend.

ppKrauss avatar Sep 24 '15 12:09 ppKrauss

Hello, the data package was prepared, it is a first draft, https://github.com/ppKrauss/country-geotime

PS: needs utmzones_list and review of neighbor_list... There are some suggestions (ex. official lang list) from http://cldr.unicode.org/ and I think some Wikidata standard values (as country area and population) can be used.

ppKrauss avatar Sep 28 '15 15:09 ppKrauss

@Yannael can you review?

rufuspollock avatar Oct 05 '15 08:10 rufuspollock

Sure, I am however quite busy until Wednesday evening I briefly checked it validates and displays properly. A short note on the README: following guidelines there http://data.okfn.org/doc/publish-faq#readme could you remove the 'Introduction' heading, and add the license one. I'll have a closer look at the data content on Thursday, and get back to you!

Yannael avatar Oct 05 '15 17:10 Yannael

Hi @ppKrauss

I had a look at the data content, and have these two questions:

  • France has no neighbouring countries, why is that?
  • How does the 'type' field for territoryContainment relate to the first table (if it does) ?

Otherwise, in terms of presentation of the dataset, I would make it self contained, and not refer to Github discussions. I would suggest to update with a short paragraph presenting what the content is about (Countries, neighboring countries, UTC).

Also, a useful section to clarify would be the 'preparation' section : ideally the sequence of commands to run in order to reproduce these two datasets.

Cheers

Yannael avatar Oct 07 '15 18:10 Yannael

Thanks @Yannael, NOTE: it is a "first draft" and, as I said before, I not have PostGIS yet (do you have? OKFN offers some infrastructure that we can use?) here to test geo-scripts... But the project aims can be discussed with this draft... I can create another CSV with only "homologated data", to add only "good columns", after each discussion and finalization.

Answering,

France has no neighbouring countries, why is that?

Yes, is the point where we stop, because I only sketched the procedures to a friend... I still need to see the data, see the maps, do tests, do experiments... (need a machine with PostGIS). There was try1 and try2, need try3... The source mundi map is not good (no topology, only workarounds), perhaps need another better source, with reliable topology.

Another suggestion is to use Wikidata queries, even to confirm some samples... Wikidata is a good "second source".

How does the 'type' field for territoryContainment relate to the first table (if it does) ?

Is part of TR35, the macroregion, "the standard codes that are macroregions (...) some two-letter region codes are macroregions, and (in the future) some three-digit codes may be regular codes". This CSV details which regions are contained within which macroregions, see the <territoryContainment> element at supplementalData.xml.

... in terms of presentation of the dataset, I would make it self contained, and not refer to Github discussions. I would suggest to update with a short paragraph presenting what the content is about (Countries, neighboring countries, UTC).

Yes, ok, I can do (review text, documentation, etc. and simplify)

Also, a useful section to clarify would be the 'preparation' section : ideally the sequence of commands to run in order to reproduce these two datasets.

Ok, good ideia to simplify, I can do.


Other question: do you agree to add coluns like population, langs_official, etc. that I suggested with the draft? They are also Wikidata information, that can be "confirmed by sampling". The official reference (as reliable source) for this "extra data" remains TR35.

ppKrauss avatar Oct 07 '15 22:10 ppKrauss

Sorry I thought that was an 'advanced draft', where the idea was to include it in official data packages. My remarks are then not so relevant. I will try to have a deeper look this week end at how everything connects there Cheers!

Yannael avatar Oct 08 '15 18:10 Yannael

@Yannael, sorry my English, I am using automatic translator in some fragments ;-) Your remarks are so relevant. In this week (or later) could be just showing and discussing the directions ... I need check simple things like "more columns? that columns is good?". After try3 completed, your checking (homologation) will be also so important.

ppKrauss avatar Oct 08 '15 18:10 ppKrauss

Hi @Yannael , now I have better map-data, see contry-neighbors.csv, you can check if it is ok (!)... I was checking some samples with Wikipedia and was ok.

The draft of preparation-text is here... Perhaps next weekend I can finish texts and unify data.

ppKrauss avatar Oct 19 '15 03:10 ppKrauss

Hi @ppKrauss That looks nice, and will give you a more detailed feedback on Saturday If you have an update till then, let me know Cheers

Yannael avatar Oct 22 '15 17:10 Yannael

Hi @ppKrauss ,

Taking a bit of perspective with the datasets you are creating, I see complementary, but also somewhat redundant, info with https://github.com/datasets/country-codes and https://github.com/datasets/language-codes/ (for example country name, or official language), and that it would be best not to duplicate any of the information already available in these two data packages.

It is really nice I think to have complementary infos about countries, such as neighbouring countries, timezone, or population, but I am a bit worried about the ‘coherence’ of such data, why put them together. Neighbouring countries is about geographic info, timezones about, well, timezone info, and population about demographic infos.

A suggestion could be to have a ‘country-metadata’ repository, where there would be one CSV for each category of info.

Or maybe best, following the ‘country-*’ naming, have different data packages, ‘country-timezones’, ‘country-neighbours’, ‘country-demographic’, with the ISO_ALPHA2 code as common key.

What do you think?

@rgrp

Yannael avatar Oct 23 '15 16:10 Yannael

Hi @Yannael , thanks to showing directions!

I am reorganazing, with your suggestion to split into "country-*"... The first one is at https://github.com/okfn-brasil/country-geoinfo check data folder. If this first is ok, my next step will to add datapackage.json in it; and start country-timezones... Them later we discuss country-langs and country-demographic.

ppKrauss avatar Oct 26 '15 04:10 ppKrauss

Hi @ppKraus, I am joining as new datasets managing curator. Looks like this baby was about to be delivered. Any update?

pdehaye avatar Dec 01 '15 15:12 pdehaye

Hello @pdehaye , welcome!

Hum, I need one weekend to review all this project, I think we can manage some pending decisions meanwhile... Can you designate a collaborator for check the country-geoinfo.csv file?

ppKrauss avatar Dec 01 '15 16:12 ppKrauss

I just looked at country-geoinfo.csv. Its general shape looks good, but I immediately have questions/comments:

  • change the country header to include some hint as to the format used. This information will be in the datapackage.json, but it is still useful to have
  • change UTMgrid_cells to UTM_grid_cells
  • make sure to specify somewhere how the neighbours were computed, and how the grid cells were computed. Indeed, it looks like the neighbours of France don't include Brazil, for instance, so some choice had to be made in excluding Guyana. And I am unsure how the cells were computed either. I see this has been part of the discussion before in this thread, and is documented a bit in your github repo. Your datapackage should be stand-alone.

pdehaye avatar Dec 01 '15 21:12 pdehaye

Thanks (!), lets see

  • "change the country header (...)" I not understand what need change, sorry; only that I need to do datapackage.json.. Ok, now we have a preliminar one.
  • "change UTMgrid_cells to UTM_grid_cells". Ok changed.
  • Hum... I checked and now is ok, no BR at FR, etc. But need more checking for homologation. Wikipedia and Wikidata offers some reference data.

So, okfn-brasil/country-geoinfo is updated.

ppKrauss avatar Dec 02 '15 01:12 ppKrauss

Yeah, forget about the first point. Ping me here when ready for review!

pdehaye avatar Dec 04 '15 23:12 pdehaye

Hello @pdehaye , you can review country-geoinfo.

ppKrauss avatar Dec 05 '15 22:12 ppKrauss

I just have, and submitted a pull request.

pdehaye avatar Dec 10 '15 23:12 pdehaye