specs icon indicating copy to clipboard operation
specs copied to clipboard

Yaml as well as JSON for Data Package descriptor files

Open rufuspollock opened this issue 8 years ago • 15 comments

Idea: allow data package descriptor files to be in yaml as well as json.

Why: yaml is easier to create and read for ordinary people. JSON is easy to get wrong.

Why not: adds complexity for all implementors of tools as they need and additional format.

I'm creating this for discussion. Very tentative idea atm.

Subissues:

  • https://github.com/frictionlessdata/specs/issues/663

rufuspollock avatar Sep 11 '16 07:09 rufuspollock

I'm neutral on this.

We programmers like to think YAML is easier for ordinary folk, but in my experience the importance of whitespace in YAML is actually a killer for ordinary folk - it is just a different type of problem to that of JSON.

However, I personally think YAML is a fine format, and supporting it as a first class citizen seems a reasonable choice, except in the browser where adding additional dependencies can actually matter (after we add YAML, we add TOML, etc., whatever the favoured serialisation format of the day, and suddenly we have bloat).

pwalsh avatar Sep 11 '16 08:09 pwalsh

Agreeing here that both YAML and JSON are easy to get wrong. Probably worth keeping an eye on the popularity of CSVY as a method of applying JSON Table Schema as YAML frontmatter to a CSV file.

danfowler avatar Sep 26 '16 19:09 danfowler

WONTFIX.

OK. If yaml is not really easier I think I'm going to close this as WONTFIX for now. It imposes additional costs on implementors without making it much easier for publishers.

Definitely open to reconsider if raised in future.

rufuspollock avatar Sep 27 '16 12:09 rufuspollock

I use YAML internally when writing the metadata for my data package because it makes for more readable code. For the output I then convert it to JSON as this is the standard you guys specified.

I think it makes sense to stick with one format for the standard. I'm neutral on whether that should be YAML or JSON as I'm not familiar with the specif pros and cons.

jgmill avatar Dec 19 '16 15:12 jgmill

@muehlenpfordt to support your point, @akariv made datapackage-pipelines which allows a pipeline creator to describe the Table Schema in YAML

danfowler avatar Feb 10 '17 22:02 danfowler

Another point to be made in this closed issue is that datapackage.json is also expressible in this Metatab format.

danfowler avatar Feb 23 '17 05:02 danfowler

I'm reopening this as i think yaml support would be really nice and simple and YAML is now really familiar (e.g. from jekyll etc) and is a lot easier to write than JSON IME.

rufuspollock avatar Jul 09 '18 07:07 rufuspollock

@pwalsh @akariv what do you think about going for this as an option going forward?

rufuspollock avatar Mar 20 '19 21:03 rufuspollock

I don't think a valid data package should ever lack a datapackage.json, but maybe we could make _datapackage.json a location convention for generated package descriptors, and have tooling for generating them.

This way developers can use whatever source-of-truth (yaml, graphql schema, classes), and use/write generators for their specific use-case. We can always include these sources in packages for documentation purposes.

micimize avatar Aug 23 '19 19:08 micimize

YAML def looks like it is becoming a default for writing human-writable but computer parsable config e.g. look at on CI tooling. I think it is time we support yaml, perhaps even as the default.

rufuspollock avatar Feb 16 '20 14:02 rufuspollock

My 2 cents: as a developer I like yaml very much as a human readable config format, but I agree with @pwalsh's comment in https://github.com/frictionlessdata/specs/issues/292#issuecomment-246169480: the importance of whitespace is not intuitive. Other than familiarity with a format, I'm not sure it offers that much benefit to the publishers, while placing quite a burden on implementors, with potentially more requests coming:

  1. Being able to mix JSON (e.g. datapackage.json) with YAML (e.g. schema.yaml)
  2. Supporting TOML, XML, ...
  3. Being able to write extensions to specs (which are expressed as JSON schemas) in YAML

Since Data Packages are a container format for publishing and archiving data, I think it is good to keep a long term perspective in mind and be restrictive/conservative when it comes to specs and only support JSON.

peterdesmet avatar Dec 13 '21 09:12 peterdesmet

I would have argued YAML is easier for a non-programmer to read, but in any case I've now made it a habit of also including a markdown and/or pdf rendering. I do find it easier to maintain in two specific cases: complex pattern constraints and long package/resource/field descriptions that span more than one line.

- pattern: https?:\/\/.+
- description: |-
    Drilling method:
    
    - mechanical
    - thermal
{
  "pattern": "https?:\\/\\/.+",
  "description": "Drilling method:\n\n- mechanical\n- thermal"
}

ezwelty avatar Apr 10 '24 07:04 ezwelty

If am understanding correctly this was closed as WONTFIX in https://github.com/frictionlessdata/datapackage-v2-draft/pull/50

I'd just add my 2c that from my experience of last half dozen years using yaml is very attractive. I understand the issue of placing a burden on implementors - perhaps the burden can be lowered if one forbade e.g. "mixing and matching".

So flagging this for consideration in v2.1 or similar - this wouldn't be a breaking change and it could be an opt-in for tools gradually? (or, perhaps we have some method for optional extensions that can be tried out for a time and we see how it goes).

rufuspollock avatar Jun 30 '24 08:06 rufuspollock