whosonfirst-sources icon indicating copy to clipboard operation
whosonfirst-sources copied to clipboard

Indicate original source of data (and via what aggregator)

Open nvkelso opened this issue 8 years ago • 13 comments

Right now we have data from Quattroshapes which is actually originates from multiple difference sources. Each source needs to be credited, so we need a consistent WOF property to deal with this.

I propose a new property like src:via (was src_via originally) where the src should state the original source, and then we should credit the data aggregator in src:via as well.

Examples:

  • Quattroshapes:
    • The city of San Francisco has a "qs:source" value of "AUS Census" (should just be US Census, oops) and "src:geom" of quattroshapes.
    • Propose that the "src:geom" should be uscensus instead, with "src_via" set to quattroshapes
  • Mesoshapes:
    • The county feature of Samba has a "meso:source" value of "EDP", though no EDP.json file is currently in the sources folder.
    • Propose that the "src:geom" should be eep instead, with "src_via" set to meso

nvkelso avatar Jan 07 '17 00:01 nvkelso

Related: https://github.com/whosonfirst/whosonfirst-sources/issues/39.

nvkelso avatar Jan 07 '17 00:01 nvkelso

I would only change this to be src:via or and equivalent prefix + ":" + key pair, to be consistent with everything else.

thisisaaronland avatar Jan 07 '17 13:01 thisisaaronland

Works for me :)

nvkelso avatar Jan 07 '17 18:01 nvkelso

Seems like most the above applies to the whosonfirst-data repo.

To give credit to our src:via sources we'll also need to elevate some of the buried remarks (like for Quattroshapes) so they are listed directly in the big sources README so there is one page with all the sources on it for consumers of Who's On First data to link to in their apps for proper and good credit where credit is due.

All need to print out in a section under https://github.com/whosonfirst/whosonfirst-sources/blob/master/sources/README.md#quattroshapes

After license bullet point, a new paragraph with:

This source includes data from the following organizations:

With bullet points listed below, alphabetically eg:

  • Europe-wide: European Environment Agency (EEA) urban morphological zones 2006
  • France: Institut Géographique National
  • Netherlands: Kadaster
  • Spain: Instituto Geográfico Nacional
  • Switzerland: swisstopo
  • United Kingdom: Contains Ordnance Survey data © Crown copyright and database right [2012]

And that list needs to be from a new JSON list in the quattroshapes.json source.

Ideally it could contain HTML text with hyperlinks (?) since I think we had problems with Markdown before.

nvkelso avatar May 18 '18 01:05 nvkelso

The textual description part of this here in the sources repo is done.

Leaving this issue open as there is related work to followup about.

nvkelso avatar May 23 '18 00:05 nvkelso

For this county in Tanzania:

  • https://spelunker.whosonfirst.org/id/1108692933/

Let's pretend it has the following properties:

  • "src:geom" = "meso"
  • "src:geom_alt" = ["naturalearth","quattroshapes"]
  • "meso:source" = "TNBS"
  • "qs:source" = "statscan"

We want to track generically the sources sources in predictable machine readable way, and in a way that doesn't need constant shuffling around as default and alt geoms are shuffled around, and without adding more sources JSONs, and making use of the existing "src:via" properties in the sources JSON we added recently. In this case Mesoshapes includes data from "TNBS" and let's pretend like quattroashapes includes data from "statscan".

NOTE: This new property would only be added in cases of WOF records where multiple sources exist for a source (eg Mesoshapes, Quattroshapes, and other *shapes sources), then all sources would be listed out in the extended format. Else no change if not multiple source sources.

We propose to add a new "src_via" prefix that accepts the same property names as src, but stores as list of lists (versus string for geom and list for geom_alt) because any one source can actually be composed of multiple sources:

  • "src_via:geom" = [["meso:tza_tnbs"],["naturalearth"],["quattroshapes:statscan"]]
    • which links to a new source_code entry in the meso and tracks both default geoms and alt geoms in one big list.

Another example:

  • "src_via:population" = [["statoids:othercensus"],["uscensus"]]

Then in the sources repo (this repo), modify the meso.json:

  • https://github.com/whosonfirst/whosonfirst-sources/blob/master/sources/meso.json source_code (versus source_name)

From:

"src:via" : {
			"context": "Tanzania",
			"source_link": "",
			"source_name": "Tanzania National Bureau of Statistics (TNBS)",
			"source_note": ""
		},

Add: "source_code": "tza_tnbs"

"src:via" : {
			"context": "Tanzania",
			"source_link": "",
			"source_name": "Tanzania National Bureau of Statistics (TNBS)",
			"source_code": "tza_tnbs",
			"source_note": ""
		},

nvkelso avatar Jun 22 '18 22:06 nvkelso

Does this need to be a different structure?

"src_via:geom"={  
   "meso":[  
      "tza_tnbs"
   ],
   "naturalearth":[  
      "naturalearth"
   ],
   "quattroshapes":[  
      "statscan"
   ]
}

And should we riff on "src:via" ala "src_via" instead of "src_src"? (updated to src_via).

nvkelso avatar Jun 22 '18 22:06 nvkelso

@nvkelso - the example in https://github.com/whosonfirst/whosonfirst-sources/issues/40#issuecomment-399602996 makes more sense.

stepps00 avatar Jun 22 '18 22:06 stepps00

Flagging @thisisaaronland for comments. We'd like to make this change next week.

nvkelso avatar Jun 22 '18 22:06 nvkelso

With regards to the source_code key I would change it to source_prefix since that's what it is.

Likewise I would consider changing all the source_* keys to be src:* since the src prefix has historically been used as a pointer to "whosonfirst-sources".

src_via seems fine but I am not sure I understand why some of the examples have lists of lists, like this:

"src_via:geom" = [["meso:tza_tnbs"],["naturalearth"],["quattroshapes:statscan"]]

Like why wouldn't it just be:

"src_via:geom" = ["meso:tza_tnbs","naturalearth","quattroshapes:statscan"]

thisisaaronland avatar Jun 26 '18 18:06 thisisaaronland

With regards to the source_code key I would change it to source_prefix since that's what it is.

👍

src_via seems fine but I am not sure I understand why some of the examples have lists of lists, like this:

That's because some sources include multiple sources so they need to be lists of lists.

nvkelso avatar Jun 26 '18 19:06 nvkelso

Okay.

thisisaaronland avatar Jun 26 '18 19:06 thisisaaronland

Likewise I would consider changing all the source_* keys to be src:* since the src prefix has historically been used as a pointer to "whosonfirst-sources".

@stepps00 ⏫

nvkelso avatar Jun 27 '18 23:06 nvkelso