frictionlessdata.io
frictionlessdata.io copied to clipboard
Table Schema Catalog
Following discussions with @rufuspollock, I'm proposing the creation of a new Frictionless Data product: the Table Schema Catalog.
Context
Situation : Poor quality of open data Question : How to improve the quality of open data?
-
Problem: Standardization of common datasets Solution: Table Schemas Example: A schema for the names of babies born in a city in a given year (the most silly example I could think of, but a real one nonetheless).
-
Problem: Checking the quality of datasets Solution: Goodtables Example: Validata, an adaptation of Goodtables for French open data.
-
Problem: Sharing open data standards Solution: Schema Catalog Example: SCDL, Schema.data.gouv.fr, Schemas.frictionlessdata.io
Explanation
Here's the rationale in the form of a (draft) blogpost: https://hackmd.io/@johanricher/SJmb-RazS
An extract summing up the purpose:
For schemas to become standards, they must be easily found and usable. They must be shared. We propose to open a new chapter for Table Schemas with Schemas.frictionlessdata.io as the place to catalog them.
I open this discussion with the community to brainstorm on this idea and hopefully iterate on it until it's good enough to get to work.
[summary] Meeting Johan + Monika
What we have
- Table schema to markdown : https://framagit.org/opendataschema/table-schema-to-markdown -> can it be used to create frontmatter from the schema metadata
- A proposal for a specification for how this catalog file should exist : https://framagit.org/opendataschema/catalog/blob/master/schema-catalog.json (used by OpenDataFrance for their own Table Schema catalog: https://git.opendatafrance.net/scdl/catalog/blob/master/catalog.json)
What we need
- A file to index all the schemas repos for this catalog (catalog.json?), this file would sit in a repo, so that users can do PR to propose schema additions (under github.com/frictionlessdata/….)
- A way to create a page for each schema, from this catalog.json index file (a script using table schema to markdown? with continuous integration triggered with a commit to the catalog.json file)
- A way to search on the website for a schema: full-text, filter/facets, using all the metadata/frontmatter
- Can we use frontmatter and a static site generator to do that? Vuepress can do full text search but can it do more?
- Vuepress and others are using https://docsearch.algolia.com/, this should be capable to do what’s needed (faceting). But what’s the complexity of implementing it for our needs?
- If we can avoid a whole backend it’s better - SCMS better
- Not too important but should be decided at some point: name (Table Schema Catalog ?), repo slug (catalog?),
- Home (index page with links to all schemas), page for each schema, containing:
Next steps
- [ ] Schedule a meeting with Rufus and Irio next week
- [ ] Sync on catalog.json with Rufus and Irio and probably other people
- [ ] Deeper research on full-text, filter/facets search (is it possible with Vuepress, how far can we go with the customization)
- [ ] Plan/Find the best way to create a page for each schema, from this catalog.json index file (Table schema to markdown; CI)
- [ ] Start stubbing the website..
- [ ] At some point: Put other FD board members in the loop before we’re too advanced
@johanricher i think this is great 👍 (and something we've wanted for a while - cf https://github.com/frictionlessdata/schemas (not sure why it got deprecated)).
Could you provide a bit more detailed spec of how schemas.f15d.io would be designed e.g. its url structure e.g. do you imagine something like this?
/ # home page
/:owner/:schema-name # human readable description of the schema
/:owner/:schema-name.json # json file for the schema
@johanricher ping ^^^ 😉
That's a good point @rufuspollock, let's talk about the sitemap and the actual workflow of this catalog!
Here are some ideas and hypothesis:
- The source of truth for each schema would stay in its owner's repo (example). Always name it
schema.jsonand use git tags for releases. - I would recommend schema.org, SPDX and JSON Schema Store as good inspirations for a successful catalog with open governance (i.e. put the data on Github and accept PRs).
- I wouldn't put usernames in the URL.
- Homepage could be schemas.frictionlessdata.io or frictionlessdata.io/schemas.
The URL structure would just be homepage/schema-name for the schema page (see below) then homepage/schema-name/version/schema.json to get the raw version.
Important: a /search page :mag: using faceting and filters based on the schema's metadata (see below). Finding a schema based on the country or field of application would be awesome.
Also:
- The usual /about page :nerd_face:
- Maybe an API? :thinking:
A schema page :page_facing_up: would:
- Lay out its profile, using metadata from the schema itself:
- Description
- Keywords / tags
- Version(s)
- Author(s)
- Image
- Country(ies) where the schema is used
- etc.
- Offer calls to actions / integration with outside tools:
- "Validate a file against this schema" (Goodtables / Validata)
- "Create a file compliant to this schema" (CSV-GG, tsfaker)
- "Download a template file compliant to this schema" (table-schema-resource-template)
- Present the "data model" of the schema, i.e. columns (example using table-schema-to-markdown)
@johanricher this is really good 👏 👏 Think this is pretty much good to go. Do you want to start stubbing something out? I can boot a repo for you and make you admin!
A few suggestions / comments:
- User / org names in the url: the advantages here are 2 fold
- Socially we have some sense of who the owner is (and the owner gets cred)
- Technically we get to mirror github etc so we don't have to worry about name collisions
- UX: we get an owner home page where i can browse all their schemas
- (Future) would it be useful to cache a copy of the schema
- (Future) listing users of a schema
I agree that mirroring the Github/Gitlab repo URL, which people are used to, can indeed be useful and give a sense of familiarity. In some way, the schema page can be seen as just an interface on top of the repo, reading the schema and offering features to make it useful.
As for the next steps, I think I came as far as I could go alone. :smile: I can administer the repo and write user stories, stuff like that, but this would need actual developers going forward. People from my own company could help but the bandwidth is kinda limited at the moment (at least until we're able to recruit again).
No hurry on this but if we want to make progress in the next few weeks we might need to do some kind of outreach to see if other people want to contribute, what do you think?
@johanricher I think we could make a start without devs. Could you start a spreadsheet in gdocs for a mini database of the existing schemas we know of with links etc.
@johanricher any response here - we actually have someone who could help out here /cc @monikappv and it would be great to get something moving.
@johanricher pinging again 😉Do you have some time to work on this in the next few weeks?
How time flies! :eyes:
Here's what I've got for now: https://docs.google.com/spreadsheets/d/1HUHg-ITdu-aqdJyIjZXpobZZRXdCOxaxsrRLJcrcaFw/edit?usp=sharing
@johanricher great! Could you do a sync up some time soon (e.g. this week) with @monikappv (and maybe me) to plan this out and next steps?
@monikappv Can you send me propositions for times that suit you this week?
@rufuspollock In the meantime, do you know lists of popular Table Schemas that I could start to add to this proto-catalog?
@monikappv Can you send me propositions for times that suit you this week?
@johanricher @rufuspollock How about tomorrow (20.02) or Friday (21.02) around 10-11 am CET?
1 hour tomorrow 10am is good. :+1: Can you send me an invite? Thanks @monikappv!
@johanricher sent you an invitation. Hope it's ok that I scheduled the meeting for 10:30 CET - I saw that @rufuspollock has a meeting at 10 CET. Not sure if he will be able to join us tomorrow, but just in case..
@monikappv @johanricher can you summarize the results of your meeting and what are next steps and timeline. At the very least what about getting a blog post up detailing what we plan to do - even if we have not done it yet 😄
@rufuspollock summary is added to the issue description. P.S. thumbs up for writing a blog post :+1:
@johanricher (and cc @monikappv ) i don't think you need much syncing with me or @Irio to get this moving. Can you start on your next steps (and if there is anything you need please say).
I'll have to find where I left off. Last time we talked with @monikappv was almost 2 months ago. She added the summary of that talk on the first post here. Right now, I'm not sure how I could pick up any of those "next steps".
@johanricher could you start collecting a list of these schemas in a google doc spreadsheet and we go from that? wdyt?
Also a blog post introducing the need and idea.
Here's what I've got for now: https://docs.google.com/spreadsheets/d/1HUHg-ITdu-aqdJyIjZXpobZZRXdCOxaxsrRLJcrcaFw/edit?usp=sharing
As for the blogpost, my first draft could be reworked of course, but it would depend on the angle. Where would it be posted? Would it be to present a work in progress? Would we have something else to show for it by then? or would the intent be to do some kind of outreach to see if other people want to contribute, as suggested before?
As for the blogpost, my first draft could be reworked of course, but it would depend on the angle. Where would it be posted? Would it be to present a work in progress? Would we have something else to show for it by then? or would the intent be to do some kind of outreach to see if other people want to contribute, as suggested before?
@johanricher the intent of the blog post would be to announce the idea and that we are starting to work on it. I think getting it up asap on the new site would be great 😄 - if you want someone to review @lauragift21 can do that and then we can get a PR in.
Great, thanks!
Hi @lauragift21 Would you like to read this draft blog post and give me some quick notes? We can use whatever medium you'd like (Google Docs, Github...) to iterate and improve it until it's ready to publish on the Frictionless Data Blog.
Hi @johanricher Sure happy to take a look and have a review sent by Monday.
The post looks good to me. I suggested a few grammatical edits in the document you can check those out. @johanricher
Thanks for your speed of response @lauragift21!
@rufuspollock What do you think about adding an invitation to give feedback somewhere? A new Github issue or thread on the OKFN forum, perhaps?
@johanricher sounds good in terms of adding invitation for feedback. I think the easiest would be an "open" google doc or to post comment here in the github forum.
On Mon, Apr 13, 2020 at 2:53 PM Johan Richer [email protected] wrote:
Thanks for your speed of response @lauragift21 https://github.com/lauragift21!
@rufuspollock https://github.com/rufuspollock What do you think about adding an invitation to give feedback somewhere? A new Github issue or thread on the OKFN forum, perhaps?
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/frictionlessdata/forum/issues/5#issuecomment-612885870, or unsubscribe https://github.com/notifications/unsubscribe-auth/AABMDMQ3KOVRI4H6HXLCQWDRMMDNLANCNFSM4KFRX6MQ .
Sorry, I meant when the post will be published, what do we propose to people that are interested to get feedback from them?
As for the blog post itself, corrections and/or comments are already possible on the draft here.
@johanricher yes i meant for when blog post published - a google doc would be fine or a link to here.