awesome-data
awesome-data copied to clipboard
country-geotime dataset, for geospatial and time information related to country
A datasets/country-geotime
to complement datasets/country-codes, with geospatial country information and time references. As "join" with country-codes
the most popular "country ID" (ISO3166-1-Alpha-2 and ISO3166-1-numeric) can be adopted. The other relevant columns already have rationale and specifications,
-
country-codes/issues/22: spec for columns
utmzones_list
andneighbor_list
. -
country-codes/issues/5: spec for
utc_list
column.
PS: territory_language and other standards are using ISO3166-1-Alpha-2 in "list of countries" attributes, see tr35/Supplemental_Language_Data.
@ppKrauss seems very sensible. Do you want to do a first pass on a data package for this?
@rgrp Thanks and ok (!), I am preparing, and may be sent weekend.
Hello, the data package was prepared, it is a first draft, https://github.com/ppKrauss/country-geotime
PS: needs utmzones_list
and review of neighbor_list
... There are some suggestions (ex. official lang list) from http://cldr.unicode.org/ and I think some Wikidata standard values (as country area and population) can be used.
@Yannael can you review?
Sure, I am however quite busy until Wednesday evening I briefly checked it validates and displays properly. A short note on the README: following guidelines there http://data.okfn.org/doc/publish-faq#readme could you remove the 'Introduction' heading, and add the license one. I'll have a closer look at the data content on Thursday, and get back to you!
Hi @ppKrauss
I had a look at the data content, and have these two questions:
- France has no neighbouring countries, why is that?
- How does the 'type' field for territoryContainment relate to the first table (if it does) ?
Otherwise, in terms of presentation of the dataset, I would make it self contained, and not refer to Github discussions. I would suggest to update with a short paragraph presenting what the content is about (Countries, neighboring countries, UTC).
Also, a useful section to clarify would be the 'preparation' section : ideally the sequence of commands to run in order to reproduce these two datasets.
Cheers
Thanks @Yannael, NOTE: it is a "first draft" and, as I said before, I not have PostGIS yet (do you have? OKFN offers some infrastructure that we can use?) here to test geo-scripts... But the project aims can be discussed with this draft... I can create another CSV with only "homologated data", to add only "good columns", after each discussion and finalization.
Answering,
France has no neighbouring countries, why is that?
Yes, is the point where we stop, because I only sketched the procedures to a friend... I still need to see the data, see the maps, do tests, do experiments... (need a machine with PostGIS). There was try1 and try2, need try3... The source mundi map is not good (no topology, only workarounds), perhaps need another better source, with reliable topology.
Another suggestion is to use Wikidata queries, even to confirm some samples... Wikidata is a good "second source".
How does the 'type' field for territoryContainment relate to the first table (if it does) ?
Is part of TR35, the macroregion, "the standard codes that are macroregions (...) some two-letter region codes are macroregions, and (in the future) some three-digit codes may be regular codes". This CSV details which regions are contained within which macroregions, see the <territoryContainment>
element at supplementalData.xml.
... in terms of presentation of the dataset, I would make it self contained, and not refer to Github discussions. I would suggest to update with a short paragraph presenting what the content is about (Countries, neighboring countries, UTC).
Yes, ok, I can do (review text, documentation, etc. and simplify)
Also, a useful section to clarify would be the 'preparation' section : ideally the sequence of commands to run in order to reproduce these two datasets.
Ok, good ideia to simplify, I can do.
Other question: do you agree to add coluns like population, langs_official, etc. that I suggested with the draft? They are also Wikidata information, that can be "confirmed by sampling". The official reference (as reliable source) for this "extra data" remains TR35.
Sorry I thought that was an 'advanced draft', where the idea was to include it in official data packages. My remarks are then not so relevant. I will try to have a deeper look this week end at how everything connects there Cheers!
@Yannael, sorry my English, I am using automatic translator in some fragments ;-) Your remarks are so relevant. In this week (or later) could be just showing and discussing the directions ... I need check simple things like "more columns? that columns is good?". After try3 completed, your checking (homologation) will be also so important.
Hi @Yannael , now I have better map-data, see contry-neighbors.csv, you can check if it is ok (!)... I was checking some samples with Wikipedia and was ok.
The draft of preparation-text is here... Perhaps next weekend I can finish texts and unify data.
Hi @ppKrauss That looks nice, and will give you a more detailed feedback on Saturday If you have an update till then, let me know Cheers
Hi @ppKrauss ,
Taking a bit of perspective with the datasets you are creating, I see complementary, but also somewhat redundant, info with https://github.com/datasets/country-codes and https://github.com/datasets/language-codes/ (for example country name, or official language), and that it would be best not to duplicate any of the information already available in these two data packages.
It is really nice I think to have complementary infos about countries, such as neighbouring countries, timezone, or population, but I am a bit worried about the ‘coherence’ of such data, why put them together. Neighbouring countries is about geographic info, timezones about, well, timezone info, and population about demographic infos.
A suggestion could be to have a ‘country-metadata’ repository, where there would be one CSV for each category of info.
Or maybe best, following the ‘country-*’ naming, have different data packages, ‘country-timezones’, ‘country-neighbours’, ‘country-demographic’, with the ISO_ALPHA2 code as common key.
What do you think?
@rgrp
Hi @Yannael , thanks to showing directions!
I am reorganazing, with your suggestion to split into "country-*
"... The first one is at https://github.com/okfn-brasil/country-geoinfo
check data
folder. If this first is ok, my next step will to add datapackage.json
in it; and start country-timezones... Them later we discuss country-langs and country-demographic.
Hi @ppKraus, I am joining as new datasets managing curator. Looks like this baby was about to be delivered. Any update?
Hello @pdehaye , welcome!
Hum, I need one weekend to review all this project, I think we can manage some pending decisions meanwhile... Can you designate a collaborator for check the country-geoinfo.csv file?
I just looked at country-geoinfo.csv
. Its general shape looks good, but I immediately have questions/comments:
- change the
country
header to include some hint as to the format used. This information will be in thedatapackage.json
, but it is still useful to have - change
UTMgrid_cells
toUTM_grid_cells
- make sure to specify somewhere how the neighbours were computed, and how the grid cells were computed. Indeed, it looks like the neighbours of France don't include Brazil, for instance, so some choice had to be made in excluding Guyana. And I am unsure how the cells were computed either. I see this has been part of the discussion before in this thread, and is documented a bit in your github repo. Your datapackage should be stand-alone.
Thanks (!), lets see
- "change the country header (...)" I not understand what need change, sorry; only that I need to do
datapackage.json
.. Ok, now we have a preliminar one. - "change UTMgrid_cells to UTM_grid_cells". Ok changed.
- Hum... I checked and now is ok, no BR at FR, etc. But need more checking for homologation. Wikipedia and Wikidata offers some reference data.
So, okfn-brasil/country-geoinfo is updated.
Yeah, forget about the first point. Ping me here when ready for review!
Hello @pdehaye , you can review country-geoinfo.
I just have, and submitted a pull request.