less.js icon indicating copy to clipboard operation
less.js copied to clipboard

Plan for comment nodes

Open lukeapage opened this issue 9 years ago • 8 comments

short term

filter nodes passed to functions

long term

  1. move all constructor args to be {}

a) do not use the constructor.. use Element.create(args) and this.clone(newArgs)

where the implementations might look like

create: function(args) {
    foreach(var arg in args) {
        this[arg] = args[arg];
    }
    this.setup();
}
clone: function(args) {
    foreach(var arg in this) {
        this[arg] = args[arg];
    }
    this.create(args);
}

move code from the constructor that sets default properties e.g. isn't just this.a = a; into setup.

E.g.

function constructor() {
}
constructor.prototype.setup = function() {
    this.a = this.a === undefined ? true : this.a;
}

main point and benefit is that we can add properties to a node without breaking things and we can start to push location and whitespace information in at the base class level.

Alternative would be to make all nodes non prototypal objects - actually by moving functions to be static on the type I think we could move to this as a stage after this if we wanted..

b) This allows us to implement in parser a function to create nodes.

createNode: function(type, args) {
    return type.create(args);
}

c)

This allows us to attach whitespace/comment/location information for each node

d)

We need a way for our Node base class to output comments automatically ? So one way would be rename genCSS to genCSSX and have genCSS implemented in Node, calling genCSSX after having output all pre-comments and post-comments

Alternatives

  • add preComments / postComments to every constructor that needs it - problem is node constructors are unwieldy
  • keep comments as nodes and keep solution to filter for functions. - problem is that visitors have to work-around having comments anywhere, is a bit hacky
  • absorb comments and do not output them - how many css hacks will this effect?

Any more ideas welcome. Also need to decide if I/we do this if it constitutes a breaking change such that we need less v3

lukeapage avatar Feb 25 '15 19:02 lukeapage

For what it's worth, I'd go for proposed solution:

absorb comments and do not output them - how many css hacks will this effect?

Keeps the compiler a heck of a lot simpler if you can just outright eliminate comment tokens during parsing and never have them as part of the AST to begin with.

If you really, really need CSS hacks based on comments to work, then you can still use escaped literals like ~"/**/" with @{} token interpolation to weave them into selectors, property names, etc.

It's not like there are really that many uses for hacks based on comments left outside the realm of supporting dead or nearly dead browsers.

rjgotten avatar Mar 05 '15 22:03 rjgotten

Thanks for raising the issue @lukeapage and opening the discussion. Let me start with this:

keep comments as nodes and keep solution to filter for functions. - problem is that visitors have to work-around having comments anywhere, is a bit hacky

This may not be true. I think we can go halfway between this and NO comment support. That is: keep them as nodes but do not allow comments absolutely anywhere. A detected comment in a disallowed location would throw a parsing error. I think this is reasonable since we're processing one language into another. Allowing comments does not necessarily translate to allowing comments as permissively as CSS. We already have much stricter language rules than CSS in other places.

Many use comments to document the resulting CSS, or to prepend comment blocks that are specially interpreted in certain environments, so I think a full removal would be a huge breaking change. Especially since Less authors have been trained from day one to use /**/ for commenting CSS and // for commenting the Less source.

That said, introducing breaking changes that put stricter rules on comments, as I said, is reasonable. Some examples might be:

  • Comments cannot exist between a property / value declaration.
  • Comments cannot exist mid-selector
  • Comments cannot exist within a function declaration

In short: I think basically the rules should be comments can be either end of line, or must be on their own line (or whatever you think is appropriate and makes regex's and node generation simple). Anything else is unexpected input. As @rjgotten suggested, an author who REALLY wants them elsewhere could used an escaped literal, as long as an escaped literal is allowed in that location.

So that would be my suggestion. Restrict comments to make the core library more maintainable, but just permissive enough to still make comments quite useful.

As to this:

Also need to decide if I/we do this if it constitutes a breaking change such that we need less v3

I think a major version increment is reasonable and expected by Semver rules. Restricting comment usage in any form would mean a lot of existing Less stylesheets would likely not compile, so yes, v3.

matthew-dean avatar Mar 05 '15 23:03 matthew-dean

I know this is a very old thread, but what I'm angling for right now in Less 4.x is to hugely simplify comments by absorbing them as part of white-space where white-space is allowed (and normalizing that white-space for value/expression comparisons), and discarding comments that are between nodes without whitespace, in order to:

  1. make AST creation very simple (doesn't need to manage comments)
  2. make tree traversal easy and predictable (comments don't need to be filtered)

As a simple example:

/**
 * This is preserved as part of a rule's "pre" node property
 */
.rule {
  prop/* this will be discarded*/: value;
}

The reason why the second one is discarded is to easily recognize whitespace in places where whitespace is meaningful, such as in selectors:

// the following adds a lot of additional logic if it is not discarded during parsing,
// because '#selector' and '.rule' don't really have a node separating them
#selector/* comment */.rule { }

vs.

// In the AST, '#selector' and '.rule' are separated by a single WS (whitespace)
// node, making selector matching logic quite easy, as the "value of" (primitive value)
// of WS will always be a single ' ', even if it's full text value (.toString() value)
// is ' /* comment */ '
#selector /* comment */ .rule { }

matthew-dean avatar Sep 30 '19 21:09 matthew-dean

prop/* this will be discarded*/: value;

The only concern is that the users will complain :) (Just like now they complain not even just for lack or presence of comments but for their exact location). Aside of that I see no problems. (Though is that no-white-space-comments are that different from a comment with white-space before or after? Can't a no-white-space-comment just inject a dummy white-space automatically? (since the comment works as a delimiter anyway, would not it be semantically equivalent?))

seven-phases-max avatar Sep 30 '19 22:09 seven-phases-max

@seven-phases-max

(Though is that no-white-space-comments are that different from a comment with white-space before or after? Can't a no-white-space-comment just inject a dummy white-space automatically? (since the comment works as a delimiter anyway, would not it be semantically equivalent?))

The problem is that if you make non-whitespace-surrounding comments inserted as a Node vs whitespace that includes a comment, you have to make special allowances everywhere during parsing that a comment rule is allowed along with X rule, so you increase the logic of rules (... which may not be a huge amount.... Hmm... I'll look at how much of an issue it is.) Alternatively, if whitespace + comment is tokenized as whitespace, then anywhere in parsing where whitespace is allowed doesn't change the lexing stage.

Mostly, what I'm trying to do right now is not only simplify parsing but simplify the structure so that AST evaluation is also orders of magnitude simpler.

I think you're right that maybe specifically grabbing comments would not be a huge burden. TBH, I'm playing with a few different ideas while refactoring, and I haven't quite gotten to comments.

To this specifically:

(Just like now they complain not even just for lack or presence of comments but for their exact location).

Ironically, I plan on making comment locations more accurate by preserving more of the original content / structure of the stylesheet. Right now, Less essentially tosses most of white-space / comment locations, and then attempts to rebuild them. I was looking at how TypeScript approaches this, and I'm doing a similar approach. Basically, comments are grabbed as kind of a "header" (or "footer" of a node, in some situations), and if your node isn't output, the comments associated with them go away.

Another way to say this: my current goal is that if you were to take a valid CSS file and run it through Less, then the output would be 99.9% (if not 100%) identical to the source, if there were no Less structures / operations within it. Less sooooorta aims in that direction, but it's not exactly on par, and there are places where Less does some basic "reformatting" and as a result, changes the meaning. Those are edge cases, to be sure, but there's also these legacy cases with specific handling for comments, etc, and as you say, there's been logic over the years to properly place or re-output (or filter) comments, or try to normalize values for matching, etc. I think most of that can be avoided by capturing almost all source and then, if Less doesn't use it (and it's a normal CSS value), just put it back the way it came in. As I said, Less mostly does this, like with preserving color formats (if they aren't part of an operation), but not necessarily in a consistent way, and historically has had bits of logic added to different AST nodes for specific issues.

Anyway, I'll rethink discarding certain comments; it may be easier than I thought.

Here's an example though of a comment that for sure would / should be discarded:

prop: 1 /* comment */ + 1;

IMO output SHOULD be:

prop: 2;

Output is currently:

prop: 2 /* comment */;

... which I think is silly. I wrote that comment within a Less operation, and that operation has been collapsed to a single value. The operation and its associated nodes (such as the two original integers) no longer exist. TypeScript takes a more sane approach: if your comment is a prelude to something that doesn't output (or within something that collapses), like a type, then it disappears. There's no sane reason why, if someone wants the comment to stay with the computed value, they couldn't place it before the operation. I don't think the Less compiler needs to do back-flips managing odd comment placement, which leads to a lot of boilerplate overhead to manage said comments. It's okay to have a few rules around comments.

matthew-dean avatar Sep 30 '19 23:09 matthew-dean

prop: 1 /* comment */ + 1;

I see now. I just thought you want to discard "no-white-space-comments" only, but keep comments like above. For the rest, yes, sure I'm understand how awful the "comment as node as within a statement" problem is.

seven-phases-max avatar Oct 01 '19 03:10 seven-phases-max

@matthew-dean Since my original comment to absorb comments in the lexer/parser, I've realized that there actually is a very valid use-case for comments in Less to remain in the AST: API documentation generation.

If comments are in the AST, then a visitor plugin can be written to extract them with reference to their surrounding context. E.g. it can be coded to be aware that a documentation comment is placed before a mixin declaration and can create an implicit association with it.

Many documentation tools work that way. JSDoc; SassDoc; Whatever Microsoft calls the XML-based system that C# uses; etc.

If you remove comments from the AST, such tools will have to rely on their own hacky half-compliant parsers - which is a right pain.

rjgotten avatar Oct 02 '19 10:10 rjgotten

@rjgotten - Yeah, I'm glad I put this out there, because I wasn't sure where / how to keep comments, but already from @seven-phases-max's feedback, I was shifting gears a bit to keep them in the AST until visitors run (and remove line comments).

I plan at some point to put a whole Less 4.x proposal (probably in less-meta), but I've been sort of testing out different ideas and writing AST tests (which Less hasn't had up until now, at least directly), to validate and iterate on those ideas. So I've tried a few different ideas around storing / not storing white-space, storing / not storing comments, to figure out:

  1. What's the best / easiest to parse, not just in performance but maintenance of the parser.
  2. What's best / easiest to serialize from the AST, and makes the AST easier to reason about and maintain.

In general, my expectation / goal (and already where I'm at) is that the Less codebase will be much easier to manage, maintain, and understand, but there will be some necessary trade-offs as a result, since there are features that were...I don't wanna say hacks but essentially by avoiding changing any of the AST structure, had to construct logic puzzles like the code @lukeapage mentioned. So I'm refactoring / rewriting a lot of that. Yet, at the same time, I'm trying to keep the necessary or logical changes to a minimum.

But anyway, to your point @rjgotten, I won't remove comments from the AST.

matthew-dean avatar Oct 02 '19 18:10 matthew-dean