coffeescript icon indicating copy to clipboard operation
coffeescript copied to clipboard

Proposal: YAML-like structure syntax

Open coffeescriptbot opened this issue 7 years ago • 30 comments

From @YamiOdymel on 2016-12-10 10:16

I still don't see why can't we use YAML-like syntax after read three of the issue which mentioned below, and I also feel that {, } and [, ] are the most un-CoffeeScript syntaxes (Not indent enough).

So instead of having (CoffeeScript)

kids = sister:
  name   : 'Ida'
  age    : 9
  parents: [
    {
      name    : 'Caris'
      relation: 'xxx'
    }
    {
      name    : 'Mike'
      relation: 'xxx'
    }
  ]

can we have RAML-like syntax like this?

kids = sister:
  name   : 'Ida'
  age    : 9
  parents:
    - name    : 'Caris'
      relation: 'xxx'
    - name    : 'Mike'
      relation: 'xxx'

and both compile to

var kids = 
{
  sister:
  {
    name   : "Ida",
    age    : 9,
    parents:
    [
        { name: 'Caris', relation: 'xxx' },
        { name: 'Mike' , relation: 'xxx' }
    ]
  }
};

More Examples

Types

CoffeeScript

types = [
  -1
  '2'
  0.3
  ->
    alert 'foo, bar'
  helloFunc
]

YAML-like (the CoffeeScript one might be better than this)

types =
  - -1
  - "2"
  - 0.3
  - ->
    alert "foo, bar"
  - helloFunc

JavaScript

var types =
[
  -1,
  "2",
  0.3,
  function()
  {
    alert("foo, bar")
  },
  helloFunc
]

Large structure

CoffeeScript

objs = [ {
  username: 'YamiOdymel'
  nickname: 'yamiodymel'
  avatar  :
    small: 'http://www.example.com/example.png-small'
    medium: 'http://www.example.com/example.png-medium'
    large: 'http://www.example.com/example.png-large'
  hobbies:
    most: [
      'tech'
      'sport'
      'eco'
    ]
    medium: [
      'animals'
      'music'
    ]
    lowest: [ 'science' ]
} ]

YAML-like

objs =
  - username: 'YamiOdymel'
    nickname: 'yamiodymel'
    avatar:
      small : 'http://www.example.com/example.png-small'
      medium: 'http://www.example.com/example.png-medium'
      large : 'http://www.example.com/example.png-large'
    hobbies:
      most:
        - 'tech'
        - 'sport'
        - 'eco'
      medium:
        - 'animals'
        - 'music'
      lowest:
        - 'science'

JavaScript

objs =
[
  {
    username: 'YamiOdymel',
    nickname: 'yamiodymel',
    avatar:
    {
      small: 'http://www.example.com/example.png-small',
      medium: 'http://www.example.com/example.png-medium',
      large: 'http://www.example.com/example.png-large'
    },
    hobbies:
    {
      most  : ['tech'   , 'sport', 'eco'],
      medium: ['animals', 'music'],
      lowest: ['science']
    }
  }
]

Other ideas

From Terser array syntax

array = []
  {}
    key: value
    bla: da
  {}
    key: value2
    bla: doom
foo :[]
  'hooray'
  'no closing character'

Better support for YAML-like syntax Enhancement: possible solution to the Array-Literal-Without-Brackets problem Terser array syntax

coffeescriptbot avatar Feb 19 '18 11:02 coffeescriptbot

From @edemaine on 2016-12-10 12:10

I have on several occasions written CS like

arrayOfObjects = [
  name: 'me' 
  email: '[email protected]'
,
  name: 'you'
  email: '[email protected]'
]

which can get a little hard to read, and sometimes it's hard to find the right outdent level to put the comma (e.g. if this is wrapped in a function call). Having optional - syntax for making array items clearer would be nice for these cases, if it's disambiguatable in the parser.

(This is a YAML and TypeScript feature, but there's still a lot of other stuff in YAML not supported here, e.g. | to write literals. So I wouldn't necessarily call this YAML support, but it's still a nice feature.)

coffeescriptbot avatar Feb 19 '18 11:02 coffeescriptbot

From @jituanlin on 2017-03-23 10:35

Nice features!I support it.

coffeescriptbot avatar Feb 19 '18 11:02 coffeescriptbot

This would be a huge improvement.

One of the main reasons I use CoffeeScript over es6 is readability. A yaml-like extension to the object notation would be amazing.

aminland avatar Feb 24 '18 17:02 aminland

This is one of the oldest ones... See #645 for the "YAML" side as well as #2259, and also #1872, #3018... Even #2642, #1190, #1579 are related. I'm OK with leaving this one open, but... I think we've seen enough of it.

vendethiel avatar Feb 25 '18 22:02 vendethiel

If people keep making the request, maybe it's not such a bad idea? Adding new capabilities and syntax sugars is one of the best ways to help with ensuring the community grows and thrives...

With respect to solving the recurring issue of denoting arrays of objects, perhaps we can take an existing quirk (possibly a bug) in coffeescript and extend it to our advantage: At present, if you write x = [a:'a' ; b:'b'] on a single line, you get x = [{ a: 'a' }, { b: 'b' }]

But if it's not on a single line:

x = [ 
  a:'a' ; 
  b:'b'
] 

currently compiles to x = [{ a: 'a', b: 'b' }];

So perhaps an easy way to address this recurring request would be to allow the quirk that causes the semicolon hack on a single line to work in the multiline case?

aminland avatar Feb 25 '18 23:02 aminland

Just adding my 2¢. See also #4600 and #4618. I don't think YAML-like syntax will improve readability.

Indention matters.

kids:
  sister:
    name: 'Ida'
    age: 9
  parents:
    - name: 'Caris'
      relation: 'xxx'
     - name:  'Mike'
       relation: 'xxx'

# bad indentation of a sequence entry at line 7, column 6: - name : 'Mike' ^

More keystrokes. YAML (9 x -) vs CS (6 = 3 x [ + 3 x ]) in given example.

objs:
  hobbies:
    most:
      - 'tech'
      - 'sport'
      - 'eco'
    medium:
      - 'animals'
      - 'music'
      - 'cooking'
    lowest:
      - 'science'
      - 'math'
      - 'reading'
# Note that indention of the array items is not strict.
objs =
  hobbies:
    most: [
        'tech'
          'sport'
      'eco'
    ]
    medium: [
      'animals'
         'music'
        'cooking'
    ]
    lowest: [ 
       'science'
         'math'
        'reading' 
    ]

Spread properties in YAML-like syntax?

child = {parents..., environment..., school...}

YAML or YAML-like syntax has its advantages, but not too much IMHO. Besides, if a list is getting longer and thus less readable, it's usually a sign that approach to the problem must be changed: split data into smaller chunks, generators, import/export module, external YAML file and parse when needed (e.g. cached version or on the fly)...

zdenko avatar Mar 02 '18 12:03 zdenko

Note that indention of the array items is not strict.

that's definitely not a feature.

vendethiel avatar Mar 02 '18 13:03 vendethiel

@vendethiel probably not, but current rules in grammar allow this.
I'm not familiar with the history, but I presumed there is probably a reason behind this. Also, my intention was not to emphasize this as a feature, just wanted to point out the difference.

(I think you copied the whole line from the comment, which now became a title because of #, and it looks like one of us is shouting :smile:)

zdenko avatar Mar 02 '18 15:03 zdenko

I really like that proposal, I use the unintended comma for now, but having YAML-like arrays would be nice. But don't forget CoffeeScript allows tab characters, while YAML doesn't (in most places). I don't know how does CoffeeScript's parser even work, so I don't know if it's an issue, but I wanted to point that out, just in case. Even if you don't use tabs, I think tabs explain the meaning of indentation quite nicely in this case.

I looked at the YAML 1.2 specs, from what I understood the - character is part of the indentation in YAML, the specs allows this:

-
  name: Mark McGwire
  hr:   65
  avg:  0.278
-
  name: Sammy Sosa
  hr:   63
  avg:  0.288

But placing - in the same line as name makes - a part of indentation. It's really just taking advantage of tab stops (which is quite ironic since YAML doesn't use tabs) like in the Horstmann style.

Semantically a map containing a sequence (which in CoffeeScript is an object containing an array) looks like this:

object:
  -
    array item
  -
    array item

Which can be (and usually is) converted to this:

object:
  - array item
  - array item

But YAML also allows omitting that indentation level for sequences, so this is also valid and pretty common:

object:
- array item
- array item

So that means if we want this in CoffeeScript, we would have to make it work with tabs and the only reasonable way I see is using tab stops like this (---| is a 4-characters wide tab, it shrinks in places with - at the beginning because of tab stops, so it looks aligned):

kids = sister:
    name: 'Ida'
    age: 9
    parents:
        -   name: 'Caris'
            relation: 'xxx'
        -   name: 'Mike'
            relation: 'xxx'

kids = sister:
___|name: 'Ida'
___|age: 9
___|parents:
___|___|-__|name: 'Caris'
___|___|___|relation: 'xxx'
___|___|-__|name: 'Mike'
___|___|___|relation: 'xxx'

And if we allow omitting the indentation level for arrays, like YAML does, this would be legit too:

kids = sister:
    name: 'Ida'
    age: 9
    parents:
    -   name: 'Caris'
        relation: 'xxx'
    -   name: 'Mike'
        relation: 'xxx'

kids = sister:
___|name: 'Ida'
___|age: 9
___|parents:
___|-__|name: 'Caris'
___|___|relation: 'xxx'
___|-__|name: 'Mike'
___|___|relation: 'xxx'

That would work for any tab width greater than 1, it's semantic and doesn't mix spaces and tabs. Unfortunatelly it won't look good if your tab is just 1 character wide (does anyone use those?), but it will still parse properly since tab width doesn't change the actual file content and not using tab stops is not an option here, because using spaces to align the object keys would still abuse tab stops, create a real whitespace mess and could cause problems with parsing it since even GitHub's syntax coloring breaks on this example (---| is tab, . is space):

kids = sister:
___|name: 'Ida'
___|age: 9
___|parents:
___|___|-.name: 'Caris'
___|___|..relation: 'xxx'
___|___|-.name: 'Mike'
___|___|..relation:
___|___|.._|- test:
___|___|.._|.._|- 4

So I think using tab stops like in Horstmann style is the way to go. And of course it won't affect you if you use spaces for indentation anyway.

Bonus: we could also allow these, which are not allowed in YAML:

kids =
  - name: 'Ida', age: 9
  - name: 'Sai', age: 15
  - name: 'Yami', age: 23

YAML requires either using curly brackets for those objects, or placing every key in a separate line.

lunakurame avatar Mar 02 '18 15:03 lunakurame

More keystrokes. YAML (9 x -) vs CS (6 = 3 x [ + 3 x ]) in given example.

But YAML also doesn't have useless lines which contain only the ] character, so the whole code uses less vertical space. And yeah, indentation matters in YAML, but it allows inconsistent indentation too, as long as every - of an array is aligned:

objs:
  hobbies:
    most:
      - 'tech'
      - 'sport'
      - 'eco'
    medium:
           - 'animals'
           - 'music'
           - 'cooking'
    lowest:
        - 'science'
        -       'math'
        -    'reading'

lunakurame avatar Mar 02 '18 15:03 lunakurame

I'm not familiar with the history, but I presumed there is probably a reason behind this.

A Long Time Ago :tm: , any amount of dedent counted as a dedent. See #3305 for an actual explanation.

vendethiel avatar Mar 02 '18 15:03 vendethiel

indentation matters in YAML, but it allows inconsistent indentation too

I stand corrected.

zdenko avatar Mar 02 '18 15:03 zdenko

Wow, what an active thread. Can we maybe narrow the focus a bit? The desire for YAML or YAML-like syntax, per the OP, is to have some more natural way to express arrays of objects. If YAML syntax specifically won’t work, for the various reasons discussed above, is there another alternative that’s better than current syntax?

GeoffreyBooth avatar Mar 02 '18 17:03 GeoffreyBooth

What's the thought on taking advantage of the semicolon bug I mentioned earlier?

aminland avatar Mar 02 '18 20:03 aminland

@aminland I find the semicolon behavior nonobvious / hard to read. It's seems pretty sketchy to rely on that behavior, and a severe overloading of what semicolon should mean (concatenating lines, or doing one thing then the other).

But I like the original idea of this issue, which is to allow specifying arrays via aligned -s as an alternative to wrapping in brackets, to more cleanly handle objects nested within lists.

Here's a real-life example from some Meteor code of mine, where I'm constructing a MongoDB query:

query = (username) ->
  $or: [
    published: $ne: false
    deleted: $ne: true
    private: $ne: true
  ,
    "authors.#{escape username}": $exists: true
  ,
    title: ///@#{username}///
  ,
    body: ///@#{username}///
  ]

I would much prefer to read/write this code:

query = (username) ->
  $or:
    - published: $ne: false
      deleted: $ne: true
      private: $ne: true
    - "authors.#{escape username}": $exists: true
    - title: ///@#{username}///
    - body: ///@#{username}///

edemaine avatar Mar 02 '18 22:03 edemaine

I agree it's an ugly character, but it does make sense semantically. If i'm in the context of any block, a semicolon signifies the end of each item in said block.

I only really suggested it since it's already a bug and fixing said bug would likely break some people's code...

aminland avatar Mar 07 '18 22:03 aminland

@aminland Do you have something against the YAML hyphen notation? I'm pretty sure the example above would be pretty unreadable with your proposed semicolon hack.

edemaine avatar Mar 07 '18 22:03 edemaine

Just that it would be very difficult to get right within coffeescript.

e.g. the following is currently valid syntax and means something completely different:

a = 
  - b

aminland avatar Mar 08 '18 00:03 aminland

While I agree that expressing arrays of objects can be kind of clunky, I'm not sure it's enough of a pain to warrant adding to CoffeeScript.

One solution I've used to make configuration objects a little more readable is just to use a dict and strip off the values.

So this:

kids = sister:
  name   : 'Ida'
  age    : 9
  parents: [
    {
      name    : 'Caris'
      relation: 'Mother'
    }
    {
      name    : 'Mike'
      relation: 'Step-Father'
    }
    {
      name    : 'Jane'
      relation: 'Step-Mother'
    }
    {
      name    : 'Tom'
      relation: 'Father'
    }
  ]

becomes:

kids = sister:
  name   : 'Ida'
  age    : 9
  parents: _.values
    mom1:
      name    : 'Caris'
      relation: 'Mother'
    dad1:
      name    : 'Mike'
      relation: 'Step-Father'
    mom2:
      name    : 'Jane'
      relation: 'Step-Mother'
    dad2:
      name    : 'Tom'
      relation: 'Father'

(The key names don't matter.)

wesvetter avatar Mar 19 '18 20:03 wesvetter

That _.values is an interesting idea, but it has some flaws:

  • if you add multiple elements with the same key, only the last one will be in the final array
  • doesn't work with CSON since it's not pure data
  • takes the same vertical space as this, which doesn't have the rest of those problems and has less clutter (no keys):
    ids = sister:
     name   : 'Ida'
     age    : 9
     parents: [
         name    : 'Caris'
         relation: 'Mother'
       ,
         name    : 'Mike'
         relation: 'Step-Father'
       ,
         name    : 'Jane'
         relation: 'Step-Mother'
       ,
         name    : 'Tom'
         relation: 'Father'
     ]
    

lunakurame avatar Apr 03 '18 10:04 lunakurame

I thought of this issue when I started reading https://m.signalvnoise.com/on-writing-software-well-aee3780767a6.

If anyone can link to some production code which demonstrates the problem, that will move the discussion forward dramatically.

Meanwhile, here's my argument against this feature: if I wanted to embed constant structures such as the examples above in my code, here's what I think I'd do:

{load: y} = (require './my-yaml').loadSync

query = (username) -> y """
  $or:
    - published: $ne: false
      deleted: $ne: true
      private: $ne: true
    - "authors.#{escape username}": $exists: true
    - title: ///@#{username}///
    - body: ///@#{username}///
"""

rdeforest avatar Apr 04 '18 19:04 rdeforest

https://github.com/edemaine/coauthor/blob/master/lib/messages.coffee is production code with the example above (and lots of others). It's not terrible as is, but would be nicer with YAML support. But I don't know how to solve @aminland's point about ambiguity with unary minus...

@rdeforest I don't think your code properly processes the /// as CoffeeScript regexes... needs some more #{...} escapes.

edemaine avatar Apr 04 '18 19:04 edemaine

I meant for the regexps to be processed by "./my-yaml". The input to y() would be something like

  $or:
    - published: $ne: false
      deleted: $ne: true
      private: $ne: true
    - "authors.rdeforest": $exists: true
    - title: ///@rdeforest///
    - body: ///@rdeforest///

But I'm also the kind of weirdo who writes this kind of thing:

publishedPublic   = ->
  published: $ne: false
  deleted:   $ne: true
  private:   $ne: true

writtenBy         = (name) -> "authors.#{escape name}": $exists: true

fieldContainsUser = (field) -> (s) -> [field]: ///@#{s}///
titleContainsUser = fieldContainsUser 'title'
bodyContainsUser  = fieldContainsUser 'body'

query = (username) ->
  $or: [ publishedPublic
         writtenBy
         titleContainsUser
         bodyContainsUser
       ] .map (predicate) -> predicate username

rdeforest avatar Apr 04 '18 19:04 rdeforest

What sets CoffeeScript still apart from modern ES2018 Javascript is its ease of readability. Since YAML has become so popular, I think the specified syntax would be greatly appreciated by newcoming users. There was 12 months silence in this thread, any news?


As a workaround, currently, I am piping all my coffee files through a custom converter before passing it to the actual cs compiler (and the syntax highlighting plugin, respectively), so I can use yaml syntax in cs. Not thoroughly tested, but does the job in my codebase:

https://github.com/phil294/MEVN-base/blob/059e21d58db49e0244361e5ab667f3138584b05a/web/build/custom-loaders/coffee-loader.coffee#L20

Which will transform something like

x:
	-	a: 1
	-	b: 2
	-	c:
			-	'one'
			-	'two'

into

x: [
	a: 1
,
	b: 2
,
	c: [
		'blub'
	,
		'two'
	]

Or, if you prefer, as a non-readable Javascript script:

  • tab indentation:
let theCode = '...';
while (match = theCode.match(/([\w\W]*^(\t+)\w+):\n((?:\2\t-\t.+\n(?:\2\t\t\t.+\n)*(?:\2\t\t.+\n(?:\2\t\t\t.+\n)*)*)+)([\w\W]*)/m)) {
  const [_, before, indent, arraybody, after] = match;
  const arraybody_transformed = arraybody.replace(/^\t/gm, '').replace(/^(\t*)-/, '$1').replace(/^(\t*)-/gm, '$1,\n$1');
  theCode = `${before}: [\n${arraybody_transformed}${indent}]\n${after}`;
}
  • space indentation (configurable amount):
const indentSize = 4;
let theCode = '...';
while (match = theCode.match(RegExp(`([\\w\\W]*^( +)\\w+):\\n((?:\\2 {${indentSize}}- {${indentSize - 1}}.+\\n(?:\\2 {${indentSize * 3}}.+\\n)*(?:\\2 {${indentSize * 2}}.+\\n(?:\\2 {${indentSize * 3}}.+\\n)*)*)+)([\\w\\W]*)`, "m"))) {
  const [_, before, indent, arraybody, after] = match;
  const arraybody_transformed = arraybody.replace(RegExp(`^ {${indentSize}}`, "gm"), '').replace(RegExp(`^( *)- {${indentSize - 1}}`), `$1${' '.repeat(indentSize)}`).replace(RegExp(`^( *)- {${indentSize - 1}}`, "gm"), `$1,\n$1${' '.repeat(indentSize)}`);
  theCode = `${before}: [\n${arraybody_transformed}${indent}]\n${after}`;
}

edit: there is a (newer version) which also works without the x: part in the example. if someone needs this in the JS form like above, please leave a note

a proper fork of the cs lexer would be better, but that would have taken me about 500 times longer.

So maybe this helps out any similarly impacient ones.

Needless to say, I would love to see the proposed syntax implemented.

phil294 avatar Mar 16 '19 11:03 phil294

Until this feature becomes a part of the language, it is possible to get something very similar through yaml-to-js.macro

lorefnon avatar Apr 29 '19 19:04 lorefnon

Another interesting alternative is the approach in livescript.

Implicit lists created with an indented block. They need at least two items for it to work.

When implicitly listing, you can use an asterisk * to disambiguate implicit structures such as implicit objects and implicit lists. The asterisk does not denote an item of the list, but merely sets aside an implicit structure so that it is not muddled with the other ones being listed.

fcostarodrigo avatar Aug 02 '19 08:08 fcostarodrigo

The livescript approach was not used on purpose.

vendethiel avatar Aug 02 '19 11:08 vendethiel

About the problem:

a = 
  - b

This wouldn't be a problem if use * instead of -. There is no unary * operator.

I read the previous issues and I think the only problem raised about creating arrays with * is that it is too specific, it only helps with array of objects. https://github.com/jashkenas/coffeescript/issues/645#issuecomment-620902

But I still think this would be a good addition to the language.

Examples:

matrix =
  * [1, 2, 3]
  * [4, 5, 6]
  * [7, 8, 9]

users =
  * name: 'John'
    age: 18
  * name: 'Mary'
    age: 21

fcostarodrigo avatar Jun 09 '20 11:06 fcostarodrigo

Strongly in favor of introducing either YAML-style or asterisk syntax. For two reasons:

  1. It's entirely consistent with the spirit of the language, which otherwise takes full advantage of whitespace to eliminate syntactical debris that otherwise adds nothing semantically.

  2. Perhaps more important, we have a pretty compelling scenario where this adds value. Namely, function composition.

One of the nice things about doing composition in CoffeeScript is that you can write things like this:

compose [
  foo
  bar
  baz
]

We do a lot of this. And it turns out to be quite nice, most of the time. But it also turns out that sometimes it makes sense to have nested composition. Now, of course, we could use variadics:

compose foo,
  bar
  baz

but there are good reasons to avoid variadics, especially in functional programming, what with all the currying and passing around lists of functions. But if we use arrays, of course, we need to worry about closing brackets and our code at times becomes quite Lisp-like, with three or four closing brackets at the end.

As a result, we're spending increasing amounts of time just dealing with bracket matching. Which again, feels very much against the whole spirit of the language.

CoffeeScript shines in the context of functional programming, except for the brackets. I think that's a neat role for CoffeeScript to play in the JavaScript ecosystem and I'd like to encourage it. Of course, we may be the only people doing this. But our hope is that more people will move to this style, once they see how compelling it can be. But it's not quite as compelling right now because of those darned brackets.

dyoder avatar Aug 03 '20 23:08 dyoder

I'd really like this feature, JSON has always looked rather ugly to me and is pretty easy to break, whereas YAML isn't as smooth and versatile as the proposed syntax. I'd also like to be able to use the new array syntax in CSON files

ghost avatar Jul 28 '21 00:07 ghost