whosonfirst-sources
whosonfirst-sources copied to clipboard
Indicate original source of data (and via what aggregator)
Right now we have data from Quattroshapes which is actually originates from multiple difference sources. Each source needs to be credited, so we need a consistent WOF property to deal with this.
I propose a new property like src:via
(was src_via
originally) where the src
should state the original source, and then we should credit the data aggregator in src:via
as well.
Examples:
-
Quattroshapes:
- The city of San Francisco has a
"qs:source"
value of"AUS Census"
(should just be US Census, oops) and"src:geom"
ofquattroshapes
. - Propose that the
"src:geom"
should beuscensus
instead, with"src_via"
set toquattroshapes
- The city of San Francisco has a
-
Mesoshapes:
- The county feature of Samba has a
"meso:source"
value of"EDP"
, though no EDP.json file is currently in the sources folder. - Propose that the
"src:geom"
should beeep
instead, with"src_via"
set tomeso
- The county feature of Samba has a
Related: https://github.com/whosonfirst/whosonfirst-sources/issues/39.
I would only change this to be src:via
or and equivalent prefix
+ ":" + key
pair, to be consistent with everything else.
Works for me :)
Seems like most the above applies to the whosonfirst-data
repo.
To give credit to our src:via
sources we'll also need to elevate some of the buried remarks (like for Quattroshapes) so they are listed directly in the big sources README so there is one page with all the sources on it for consumers of Who's On First data to link to in their apps for proper and good credit where credit is due.
All need to print out in a section under https://github.com/whosonfirst/whosonfirst-sources/blob/master/sources/README.md#quattroshapes
After license
bullet point, a new paragraph with:
This source includes data from the following organizations:
With bullet points listed below, alphabetically eg:
- Europe-wide: European Environment Agency (EEA) urban morphological zones 2006
- France: Institut Géographique National
- Netherlands: Kadaster
- Spain: Instituto Geográfico Nacional
- Switzerland: swisstopo
- United Kingdom: Contains Ordnance Survey data © Crown copyright and database right [2012]
And that list needs to be from a new JSON list in the quattroshapes.json source.
Ideally it could contain HTML text with hyperlinks (?) since I think we had problems with Markdown before.
The textual description part of this here in the sources repo is done.
Leaving this issue open as there is related work to followup about.
For this county in Tanzania:
- https://spelunker.whosonfirst.org/id/1108692933/
Let's pretend it has the following properties:
-
"src:geom"
="meso"
-
"src:geom_alt"
=["naturalearth","quattroshapes"]
-
"meso:source"
="TNBS"
-
"qs:source"
="statscan"
We want to track generically the sources sources in predictable machine readable way, and in a way that doesn't need constant shuffling around as default and alt geoms are shuffled around, and without adding more sources JSONs, and making use of the existing "src:via"
properties in the sources JSON we added recently. In this case Mesoshapes includes data from "TNBS" and let's pretend like quattroashapes includes data from "statscan".
NOTE: This new property would only be added in cases of WOF records where multiple sources exist for a source (eg Mesoshapes, Quattroshapes, and other *shapes sources), then all sources would be listed out in the extended format. Else no change if not multiple source sources.
We propose to add a new "src_via
" prefix that accepts the same property names as src
, but stores as list of lists (versus string for geom and list for geom_alt) because any one source can actually be composed of multiple sources:
-
"src_via:geom"
=[["meso:tza_tnbs"],["naturalearth"],["quattroshapes:statscan"]]
- which links to a new source_code entry in the meso and tracks both default geoms and alt geoms in one big list.
Another example:
-
"src_via:population"
=[["statoids:othercensus"],["uscensus"]]
Then in the sources repo (this repo), modify the meso.json:
- https://github.com/whosonfirst/whosonfirst-sources/blob/master/sources/meso.json source_code (versus source_name)
From:
"src:via" : {
"context": "Tanzania",
"source_link": "",
"source_name": "Tanzania National Bureau of Statistics (TNBS)",
"source_note": ""
},
Add: "source_code": "tza_tnbs"
"src:via" : {
"context": "Tanzania",
"source_link": "",
"source_name": "Tanzania National Bureau of Statistics (TNBS)",
"source_code": "tza_tnbs",
"source_note": ""
},
Does this need to be a different structure?
"src_via:geom"={
"meso":[
"tza_tnbs"
],
"naturalearth":[
"naturalearth"
],
"quattroshapes":[
"statscan"
]
}
And should we riff on "src:via"
ala "src_via"
instead of "src_src"
? (updated to src_via
).
@nvkelso - the example in https://github.com/whosonfirst/whosonfirst-sources/issues/40#issuecomment-399602996 makes more sense.
Flagging @thisisaaronland for comments. We'd like to make this change next week.
With regards to the source_code
key I would change it to source_prefix
since that's what it is.
Likewise I would consider changing all the source_*
keys to be src:*
since the src
prefix has historically been used as a pointer to "whosonfirst-sources".
src_via
seems fine but I am not sure I understand why some of the examples have lists of lists, like this:
"src_via:geom" = [["meso:tza_tnbs"],["naturalearth"],["quattroshapes:statscan"]]
Like why wouldn't it just be:
"src_via:geom" = ["meso:tza_tnbs","naturalearth","quattroshapes:statscan"]
With regards to the source_code key I would change it to source_prefix since that's what it is.
👍
src_via seems fine but I am not sure I understand why some of the examples have lists of lists, like this:
That's because some sources include multiple sources so they need to be lists of lists.
Okay.
Likewise I would consider changing all the source_* keys to be src:* since the src prefix has historically been used as a pointer to "whosonfirst-sources".
@stepps00 ⏫