civic-tech-taxonomy icon indicating copy to clipboard operation
civic-tech-taxonomy copied to clipboard

Process for maintaining Taxonomies

Open gregboyer opened this issue 6 years ago • 6 comments

What process might work for maintaining taxonomies?

gregboyer avatar Aug 06 '19 07:08 gregboyer

Notes/thoughts/questions from meeting with Ovio and DemocracyLab:

  • What if someone submits a project that does not fit guidelines? E.g. inappropriate topic, for-profit, etc.
  • Does there need to be a project entry approval process? DemocracyLab approves all projects
  • Should we have an “other” entry that people can submit as a means of gathering potential addition to taxonomies?
  • Ensure that taxonomies are backward compatible
  • Can projects be generating using NLP and current GitHub repos?

gregboyer avatar Aug 06 '19 07:08 gregboyer

For submitting projects outside of guidelines, we could have this be part of an ongoing validation of all projects listed (maybe weekly) that could then be used to alert project or brigade leaders that they could improve their project (or a link where to propose a new addition to the taxonomy).

Actions for this would be:

  1. A place where new additions or changes to taxonomy could be proposed / discussed / ratified
  2. The verification task and a corresponding "brigade's project meta data status" page and monthly or quarterly reminder to update any projects not passing validation.

nikolajbaer avatar Aug 08 '19 16:08 nikolajbaer

@nikolajbaer @gregboyer what I'm building towards for the proof-of-concept is an entirely open github-moderated process for maintaining the taxonomies. Last week I "projected" a number of existing taxonomies into how I'm picturing our format: https://github.com/codeforamerica/civic-tech-taxonomy/branches/all?utf8=%E2%9C%93&query=sites

The format consists of a separate TOML file for each "record" with keys recursively sorted alphabetically so the same data will always produce the same file, and then the path of each file is its identifier. This format will give us the most effective platform as far as I could determine to have an open process for both humans and machines to engage with the taxonomies.

One thing I really like about this approach is that if tools start to be built to get configured directly with what repo#branch to pull taxonomy from, than folks can easily switch to or experiment on forked taxonomies, or even start out managing their current taxonomy as a fork that gradually merges.

We will begin developing a master taxonomy in a new branch in a similar format, and then a mix of humans and bots/tools/scripts can open Pull Requests to propose/discuss/ratify changes

Then we can set up a CI process to publish taxonomy updates to various useful formats upon merge

I described this a bit more over in codeforamerica/brigade-project-index#9

themightychris avatar Aug 08 '19 18:08 themightychris

To build on Chris' idea above, it'll just be managed by people with the ability to merge into the particular project. We can start with a draft of some guidelines for communication, turnaround time, etc; and maybe have 3-5 people who can vote and review issues. They should also document how and why they make decisions.

gregboyer avatar Sep 16 '19 00:09 gregboyer

Colin reviewed all of the mvp stories and tagged them with bullets on which taxonomies/metadata is required for that. We can create a draft that meets those requirements and go from there.

gregboyer avatar Sep 16 '19 01:09 gregboyer

First decision point is whether we want a top-down or bottom-up taxonomy. Top-down means we (who?) decide buckets and synonyms within them. Bottom-up means we (who?) categorize all the tags retrieved by the project-index-crawler

As of June 2020 we have a mix, I have added 100s of synonyms mostly within the buckets that I found here. I'm happy with this approach but it means, a) to continue to categorize (put in buckets) 100s more "crawler" tags and b) figure out a maintenance model.

giosce avatar Jun 08 '21 22:06 giosce