ideas icon indicating copy to clipboard operation
ideas copied to clipboard

Human data matcher (JS app to support manual data matching)

Open daguar opened this issue 11 years ago • 0 comments

A single-page JS app to make manual data matching as fast and easy as possible for the human matchers.

Front end:

  • UI shows a single entity on the left (the one to be matched TO)
  • UI shows N entities on the right (eg, fuzzy matches found by a computer that need to be resolved down to the ONE correct match)
  • The UI is built to maximize the speed and comfort with which someone can click/keyboard-select person select the correct entity on the right

Back end:

  • An instance of the JS app is configured with 2 endpoint URLs:
    • An URL to request data to be matched (eg, GET /unmatched), serving data in the form:
{ to_be_matched: ...,
  possible_matches: [..., ..., ...] }
  • A URL to send back match results to (eg, POST /match)
    • The JS app basically sends back a big JSON chunk with a structure like
{ to_be_matched: ...,
  selected: ...,
  not_selected: [..., ...] }

The deployer is responsible for building this backing service (a little bit of work) but that's because (a) building it doesn't require that much effort, and (b) IMHO where you get the data and what you do with it is outside the scope of this abstraction/tool, allowing it to be more easily reusable for solving JUST the bulk-human-matching problem.

Example use case:

(Taken from this past year @codeforamerica):

  • You have a list of restaurants from a city's health inspections database ("city data")
  • You want to link the restaurants in this database to Yelp ratings, which means matching to Yelp restaurants ("Yelp data")
  • You could use this tool by building a micro-service that:
    • Serves /unmatched data by taking a single "city data" object (info on a single restaurant), and also taking all the results obtained from hitting the Yelp search API with the name and address of the restaurant
    • Takes the match data posted to /match and saves both the unique ID for the "city data" restaurant and the unique ID of the user-selected "Yelp data" entity

daguar avatar Jan 11 '14 21:01 daguar