trafficstars

Abstract Syntax Forestry

An EmberConf 2020 workshop from Mainmatter to teach you the basics about Abstract Syntax Trees.

Before we begin, there are three things you'll need to have installed on your computer to participate in this workshop:

a browser
Node.js 10 or above
yarn

Hello everyone and welcome to this workshop which I titled "Abstract Syntax Forestry".

Since probably not everyone doing this workshop is a native speaker and immediately understands this word play, let's quickly start with an intro about what that title means.

First of all, what does "Forestry" mean?

Any idea?

According to Wikipedia:

Forestry is the science and craft of creating, managing, using, conserving, and repairing forests, woodlands, and associated resources for human and environmental benefits.

or in other words:

Forestry means: working with trees!

now... what is a tree?

This is a tree!

But it's not the kind of tree we want to talk about in this workshop...

There is a certain type of data structure that is also called tree, because depending on how you look at it, it roughly looks like a real tree. And some other terminology (aka. words) also are borrowed from this metaphor. The tree data structure has leaf elements, it has branches and it has a root elements.

About that... what defines a tree data structure?

First of all, a tree is a graph data structure, but a specific kind of graph. The defining features are that a tree has exactly one root element (at the top here), and that element can have 0, 1, 2, 1000, or even more, child elements. And then each of those child elements can also have an arbitrary number of child elements.

But, a child element can only ever have one parent element!

If you need something else to compare against: it is very roughly like a recursive belongsTo relationship in Ember Data or any other relational database/ORM kind of system.

Now that we have a rough understanding of what a tree is in theory, let's get into a few real examples of where the tree data structure is being used.

Any ideas?

(yes, I realize that in this format you've likely already had a glimpse of the next slide 😅)

One example of a tree structure that we use all the time is: JSON.

JSON has a very limited number of data types and arrays and objects are the only ones that can have children. An array can have 0, 1, 1000, or more elements in it. And an object can also have 0, 1, 1000, or more keys with their associated values in it.

And then there is the constraint of having a single root element. If you look at a JSON file, you can see that it always either has a single object or array as the main element in the file, and if you try to put a second thing in the file it will result in a syntax error.

That means, similar to how trees are a specific kind of graph data structure, we can consider JSON to be a specific kind of tree data structure.

Another real-world example is HTML. In HTML you have the root <html> element, and that can have multiple children, though in reality it only supports <head> and <body>. But those then can also have an arbitrary number of child elements.

You can see on the right side of the above slide how the tree structure of the HTML file on the left side could be represented.

If you feel reminded of the DOM (document object model) in the browser, that is no coincidence. The browser also parses the HTML file into such a tree structure and then exposes it to JavaScript as the DOM.

So far we've only been talking about trees. But what about the other two words in that workshop title? what does "abstract syntax" mean?

In a very basic way, an "Abstract Syntax Tree" is a JSON object that represents the code structure of a file.

As you can see in the slide, I've put asterisks (*) on the words "JSON" and "code", because...

While most parsers in the JavaScript ecosystem produce JSON-based ASTs, there are quite a few parsers in other ecosystems that don't use JSON. Since this workshop assumes that most of you are working primarily with JavaScript we will focus on JSON-based ASTs here for now.

Please note that the general concept will still be roughly the same, whether the parser produces JSON, or not.

The other asterisk was on "code". Because while ASTs are mostly used for code there are also some other file parsers that can produce ASTs.

Take our earlier example of HTML. HTML is a subset of XML, and XML can be used for regular data files. But a lot of HTML parsers that produce ASTs can handle XML, so you can easily convert an XML data file into an AST.

But enough of all this theoretical talk, let's look at an example!

There is a great tool on the internet that I would like to introduce you to, and it's called AST Explorer.

You can find it at https://astexplorer.net.

If you've never opened it before it will start you on the default view, with an example JavaScript file on the left side, and the corresponding syntax tree on the right side.

You can notice that when you click on something in the left panel it will automatically focus the corresponding syntax tree element in the right panel, and similarly, if you hover over something in the right panel it highlights the corresponding characters on the left. (if it does not do that, check if you have the "Autofocus" checkbox enabled on the right side of the screen)

Where it gets interesting is when you move your cursor over the "JavaScript" button in the menu bar at the top. Here, you should see a dropdown menu of all the available languages that the AST Explorer understands. If you look closely, you can see that it also supports Handlebars, which is what we will be focusing on for this first bit of the workshop.

If you click on "Handlebars" you will see that the snippet in the left panel has changed to a small example template, and the right panel is now also showing something different: the Handlebars syntax tree.

In case you're wondering "where is the JSON that we talked about?", have a look at the "Tree | JSON" tab bar on the right side of the screen. If you click on "JSON" you can see the raw JSON data, while the "Tree" view shows a slightly more ergonomic view of the same data.

Let's start with the first exercise!

I invite you to click around a bit in the AST Explorer and when you feel comfortable try to answer the three questions above.

If you don't know what an element modifier is I would recommend to first have a look at the Ember.js Guides that explain what they are and what they can be used for.

To get started you can modify the example template on the left side of the screen to match the snippet below:

<div class="entry">
  <h1>{{title}}</h1>
  <div class="body">
    {{body}}
  </div>
  <button type="button" {{on "click" this.onNext}}>Next</button>
</div>

Solution 1

what is the node type of and element modifier in handlebars templates?

To solve this first exercise we need to click on the on in {{on "click" this.onNext}}, which will focus a PathExpression element in the AST. You can see that this PathExpression belongs to the path attribute of the parent element and that element has the type attribute ElementModifierStatement

And that is already the answer to our first question "what is the node type of and element modifier in handlebars templates?":

The node type of an element modifier is ElementModifierStatement

Solution 2

what other attributes does a modifier have?

For this question we keep our focus on the ElementModifierStatement node in the AST. We've already seen that the element has a type (ElementModifierStatement) and a path (a PathExpression with on), but there are also some other attributes here:

params holds a list of all the positional parameters ("click" and this.onNext in this case).
hash holds information about named parameters. for example the on modifier that we're using in this example supports a named parameter called passive, that can be used like this: {{on "click" this.onNext passive=true}}

Depending on the state of the "Hide location data" checkbox in the AST Explorer you can also see a loc attribute. This attribute tells us where in the template file this AST node starts and ends. This information is for example used by the AST Explorer to highlight the correct characters when we hover over the nodes in the panel on the right side.

Solution 3

how are modifiers assigned to their parent node?

To answer this last question of the exercise we need to look at the parent node of our ElementModifierStatement. You can see that the ElementModifierStatement node is part of an array, which makes sense, because an element in Handlebars can have multiple modifiers at the same time.

That array of element modifiers is assigned to the modifiers key of the ElementNode parent element, and that is already the answer that we're looking for: modifiers.

Alright, after this first exercise let's see what all this is actually useful for.

Any ideas?

Asking this without you seeing the answer already on the next slide would obviously work much better, but let's pretend that someone in the room had raised their hand and answered one or more of these things... 😉

There are generally three categories of tools that we can build using ASTs.

The first category is code analysis tools. Basically tools that parse code files into ASTs and then analyze the code via the AST. Some tools might analyze the code directly as characters, but it is usually waaay easier if you can work with the code in a more structured way.

The second category are compilers. These kinds of tools take your code, parse it, convert it into something different that a machine can execute and then write it out again.

Don't worry if this sounds complicated or abstract, we'll get to some example in just a few seconds... depending on how fast you can read... 😄

The third category can be named refactoring tools, or sometimes also referred to as codemods. These tools read your code, adjust it in some way and write it out again.

At this point you might be thinking: "wait, the second and third category sound almost the same... what's the difference?"

And you are absolutely correct that the lines are a little blurry here.

The main difference between the two categories is that compilation tools write the output in a way that is optimized for machines to read it, with often a lot of optimizations that make the resulting code quite unreadable.

Refactoring tools however try to keep the output as close to the input as possible, so that the developers are still able to read the code in an easy way.

I promised you examples, so let's go through all three of these categories and find a few examples.

As you can see the slide above is blank again, which is usually the trigger for you to come up with a few answers yourself before we continue.

...

Alright, let's see what our example list has on it.

The most famous code analysis tool in the JavaScript ecosystem right now is ESLint.

Something similar also exists to analyze CSS and SASS files and it is called stylelint.

And then we also have a bunch of code analysis tools from the Ember.js ecosystem specifically, like ember-template-lint, the Ember Language Server, and ember-intl-analyzer.

The next category is "Compilation".

One famous compiler is gcc, which can be used to turn C and C++ code into machine code that can run on your computer.

But we want to focus on the JavaScript ecosystem here...

You probably know it under the name "transpiler" instead of "compiler", but Babel fits our definition quite well. It's mostly called transpiler because it happens to compile the input language, JavaScript, to the same output language, JavaScript.

Another example that is again more CSS focused is SASS, which compiles SASS files to CSS files.

PostCSS goes in a similar direction, but here you could also call it a transpiler since it compiles CSS into CSS.

Another example is UglifyJS, which compiles JavaScript into minified JavaScript. The code after the compilation still does the same thing, but it's optimized for machines to execute, and no longer for humans to read or edit.

And looking more into the Ember.js ecosystem, we have the Glimmer template compiler, which turns our Handlebars templates into code that runs directly in the browser.

If we look a little closer here, we can see that a lot of these compilers are actually built out of a number of smaller plugins that can be enabled or disabled as we want. This is similar to how you can turn on and off certain rules in your ESLint or ember-template-lint config files.

Finally, the last category: Refactoring

If you haven't built a codemod before it's unlikely that you know these tools so let's jump right to the list of examples.

If you want to build a codemod that modifies JavaScript files then the best solution is currently jscodeshift, which is built on top of recast, and gives it a more jQuery-like API.

For Handlebars templates the awesome Robert Jackson has built something similar, which is called ember-template-recast. Something like jscodeshift does not yet exist for it, but I've seen some early prototypes and the future looks bright! 😊

And then the last tool on this list is something that we've already seen before: PostCSS

PostCSS is in some sense a special case because it largely preserves the input formatting by default, but is also regularly used as a transpiler. We won't focus on PostCSS in this workshop, but if you ever want or need to write a codemod in the future that needs to touch CSS then I would recommend to have a look at PostCSS.

Alright, now that we have all of that out of the way, let's see how we can apply what we just learned. The rest of this workshop will mostly be examples and exercises. The examples we'll go through together, and the exercise you're supposed to solve by yourself, but if you run into any blocking issues feel free to ask questions or have a look at the solution.

Also, now would be a good time to take a short coffee or toilet break, and after that we'll dive into the chapter "Code Analysis with @glimmer/syntax"!

(and if you haven't done it yet, this small break would also be a great time to run yarn install in the root folder of this repository 😉)

In this first example exercise the goal is to count how often each HTML tag (like <h1> or <div>) and component name (like <LinkTo>) is used in a Handlebars template.

To make things a little easier I've prepared a small Node.js script for us that reads all of the template files in an example app and makes it easier for us to analyze the templates.

You can find the example app in the 02-count-tags folder, and the script is at count-tags.js. If you run it now, you will see something like:

$ node count-tags.js
Map {}

This is unsurprising because we haven't written any code yet and nothing is filling the tagCounter map yet that we're outputting at the end of the script.

Let's look at the solution together. I've included it as copy-pasteable code below, but I would recommend that you try to type it yourself so that the following exercises will be easier to follow.

Solution

const fs = require('fs');
const globby = require('globby');
const glimmer = require('@glimmer/syntax');

let tagCounter = new Map();

// find all template files in the `app/` folder
let templatePaths = globby.sync('app/**/*.hbs', {
  cwd: __dirname,
  absolute: true,
});

for (let templatePath of templatePaths) {
  // read the file content
  let template = fs.readFileSync(templatePath, 'utf8');

  // parse the file content into an AST
  let root = glimmer.preprocess(template);

  // use `traverse()` to "visit" all of the nodes in the AST
  glimmer.traverse(root, {
    ElementNode(node) {
      // read the tag name from the `ElementNode`
      let { tag } = node;

      // read the current count for that tag name
      let previousCount = tagCounter.get(tag) || 0;

      // increase the counter by 1
      tagCounter.set(tag, previousCount + 1);
    }
  });
}

// output the raw results
console.log(tagCounter);

There are a few things here to understand so take your time before you continue, because all of what we will work on from now will build on these basics.

First of all, you can see that we import a package called @glimmer/syntax. This package lives in the glimmer-vm project and can be used to parse Handlebars files into JSON-based syntax trees.

The next interesting line is where we call glimmer.preprocess(template). This parses the file and returns a JSON object, or it will throw an error if the template has a syntax error.

And please don't ask me why this function is called preprocess() instead of something more straight-forward as parse(). I honestly don't know... 😅

Anyway, if, instead of continuing with the solution, you use console.log(root); now, you will see that it looks roughly similar to the AST we've seen before in the AST Explorer. But as you can also see, working with the raw JSON-based AST can be a bit hard because it is often quite large. I can highly recommend to have the AST Explorer open in a browser tab whenever you work on anything AST-related.

For this example we can use the default snippet in the AST Explorer for now, which we can restore by hovering over the "Snippet" menu item, and then selecting "New" from the menu. If you're not in "Handlebars" mode, make sure you select it from the languages menu before doing the above.

If you now click on one of the <div> elements or the <h1> you can see that all HTML tags are represented by ElementNodes. And all of these ElementNodes have an attribute tag, which is the tag name of the HTML element. The same is also true for component invocations that use Angle Bracket Syntax like <LinkTo @route="index">Home</LinkTo>.

If we want to count how often a tag is being used that means we need to look at all of the ElementNodes in the AST and count how often the same tag attribute appears.

But there is a problem... You can see that one of the <div> elements is nested in the other one. And the AST reflects that, by also having a nested structure. Now, we could write a bit of code that goes through every child element of the root node, then every child element of those, and so on, but, luckily, the Glimmer developers have already done that for us.

Let me introduce you to the traverse() function in the @glimmer/syntax package. The way this works is by giving two arguments to the function: first, the root element of the AST, and second, a JavaScript object. That JavaScript object can contain callback functions for any or all of the AST node types that we've encountered so far.

As a quick example, try the following code:

glimmer.traverse(root, {
  ElementNode(node) {
    console.log(node);
  },
  AttrNode(node) {
    console.log(node);
  },
});

You can see that only AST nodes with the types ElementNode and AttrNode are printed on the console.

For our specific example we only care about ElementNodes though. Inside of the callback function we extract the tag attribute from the node, look at the tagCounter map to figure out the previous count for this tag name and then increase it by one. Finally, we print the content of tagCounter to the console.

If everything worked you should see that there are currently 240 <div> tags in this app, and e.g. <LinkTo> is used 95 times. 🎉

This concludes our first example exercise. This next one you should first try on your own and only once you solved it, or are really stuck you can click on the "Solution".

The goal of this exercise is to use console.log() to warn about the confusing pattern of using an unless condition with an else block.

As usual I have prepared a small script for us that can be used as a starting point. This time the code is in the 03-no-unless-else folder in the find-unless-else.js file.

If you need a first, initial hint: type the code snippet from the slide into the AST Explorer first, and play around with it a little bit to figure out how we can determine if the unless condition has an else block or not. Once you have an idea try to use our knowledge from the last example to finish the script file.

At the end, the output should look something like this:

$ node find-unless-else.js 
Found unless/else in app/components/crate-toml-copy.hbs:6:8
Found unless/else in app/templates/crate/owners.hbs:60:8

Solution

By playing around in the AST Explorer we can see that the defining thing about the unless block is that it is a BlockStatement with a path attribute that is a PathExpression and has an original attribute of `unless.

So far, so good. Now we are able to find unless blocks in our templates, but how can we figure out if the have an else block too?

If we play around some more in the AST Explorer by removing the else block from the snippet, we can see that the inverse attribute of the BlockStatement changes. When there is an else block we can see a Block element for the inverse attribute, otherwise the inverse attribute is null.

To recap, we need to find all BlockStatements, that have a PathExpression with original: 'unless', and have an inverse that is not null.

How can we find all BlockStatements in the template? By using the traverse() function, similar to the previous example exercise:

glimmer.traverse(root, {
  BlockStatement(node) {
    // TODO check for the other requirements and warn if they match
  },
});

Once we see a BlockStatement the callback function above will be called and we now need to check if this BlockStatement is an unless block and if it has an else block too:

if (
  // first we make sure that `node.path` is really a `PathExpression`
  // since there are a few edge cases where it might not be
  node.path.type === 'PathExpression' &&

  // then we check if this is an `unless` block
  node.path.original === 'unless' &&

  // and finally, we check if the block has an `else` block too
  node.inverse
) {
  // TODO print warning
}

I'll leave it up to you how exactly to print the warning. Below is an example that I came up with that includes the filename, line number and also the column at which the unless block is found:

const fs = require('fs');
const globby = require('globby');
const glimmer = require('@glimmer/syntax');

// find all template files in the `app/` folder
let templatePaths = globby.sync('app/**/*.hbs', {
  cwd: __dirname,
  absolute: true,
});

for (let templatePath of templatePaths) {
  // read the file content
  let template = fs.readFileSync(templatePath, 'utf8');

  // parse the file content into an AST
  let root = glimmer.preprocess(template);

  // use `traverse()` to "visit" all of the nodes in the AST
  glimmer.traverse(root, {
    BlockStatement(node) {
      if (
        // first we make sure that `node.path` is really a `PathExpression`
        // since there are a few edge cases where it might not be
        node.path.type === 'PathExpression' &&

        // then we check if this is an `unless` block
        node.path.original === 'unless' &&

        // and finally, we check if the block has an `else` block too
        node.inverse
      ) {
        // if so, we print a warning to the console
        console.log(`Found unless/else in ${templatePath}:${node.loc.start.line}:${node.loc.start.column}`);
      }
    },
  });
}

Optional Extra Exercise

In the 03b-no-unless-else-lint-rule folder I've included an optional extra exercise. This was meant for people doing the in-person workshop that would otherwise get bored while waiting for some of the slower participants to finish the exercise.

In this extra exercise the goal is to write a custom lint rule for ember-template-lint, that basically does the same thing as our basic Node.js script from the previous exercise: find unless/else issues and warn about them.

You can see in the .template-lintrc.js file that we define a custom template-lint plugin and import the custom/no-unless-else rule from the lib/template-lint-rules/no-unless-else.js file. If you look at that file you can see that it imports the Rule class from the ember-template-lint package and the exports a new subclass of it with an empty visitor() method.

The term "visitor" stands for the object with the callback functions that we usually pass to the traverse() function. In this case ember-template-lint calls the traverse() function for us, and we only have to return a suitable visitor object from the visitor() method of the rule.

If you need more help I would recommend to read through the Plugin documentation of ember-template-lint, which also explains what you have to do to tell template lint that you want to display a warning somewhere.

And finally, the results of this exercise should be the same as for the previous exercise:

$ yarn -s lint:hbs
app/components/crate-toml-copy.hbs
  6:8  error  Found unless/else  custom/no-unless-else

app/templates/crate/owners.hbs
  60:8  error  Found unless/else  custom/no-unless-else

✖ 2 problems (2 errors, 0 warnings)

Optional Extra Exercise Solution

const Rule = require('ember-template-lint').Rule;

module.exports = class extends Rule {
  visitor() {
    return {
      BlockStatement(node) {
        if (
          // first we make sure that `node.path` is really a `PathExpression`
          // since there are a few edge cases where it might not be
          node.path.type === 'PathExpression' &&

          // then we check if this is an `unless` block
          node.path.original === 'unless' &&

          // and finally, we check if the block has an `else` block too
          node.inverse
        ) {
          // if so, report a template-lint warning
          this.log({
            message: 'Found unless/else',
            line: node.loc && node.loc.start.line,
            column: node.loc && node.loc.start.column,
            source: this.sourceForNode(node)
          });
        }
      },
    };
  }
};

We've now covered two examples on how to analyze Handlebars templates using Node.js scripts and the @glimmer/syntax package. So far, we've only read, parsed and analyzed code, but we haven't been writing anything.

This next chapter is about compilation, where we read code, transform it, and then write it out again.

Let's dive right into the first example!

We all like small asset sizes so that our users have to download less bytes. How can we achieve that? By sending as few bytes as we can. One way of doing that might be to collapse unnecessary whitespace characters in our templates.

Please note that this is an artificial example and in real-world apps this has certain caveats. If you want to use something like this then I would recommend to look at the ember-hbs-minifier addon, but make sure to properly test and QA your app after you've installed it! 😉

Let's get back to our task. We want to replace all the collapsible whitespace in the template with single space characters. For the purposes of this exercise we'll define "whitespace" as space characters ( ), new lines (\n), line feeds (\r) and tab characters (\t).

Since this is a workshop about syntax trees and not about regular expressions I'll show you how to replace such whitespace right away:

let newText = oldText.replace(/[ \n\r\t]+/g, ' ');

In case you're wondering the g in the regular expression stands for "global" and means that it will replace all occurrences, and not just the first match that it finds.

Since recompiling an Ember app every time we change our compilation plugin would be very time consuming, we will do this example exercise directly in the AST Explorer.

At the top of the page in the menu bar you can find a "Transform" toggle button, and if you hover over it, you can select "glimmer". This will open up two new panels at the bottom of the page.

The bottom left panel is the editor in which we will develop our custom compiler plugin. On the bottom right is the "output" after the compiler plugin has been applied.

You can see on the bottom left panel that the AST Explorer includes an example compiler plugin which reverses the tag names of all the ElementNodes. This is obviously not a useful real-world plugin but it demonstrates what kind of code is expected from us to work with the template compiler API.

But, we have yet to figure out what AST nodes to look at in this exercise!

If we click on any of the indentation whitespace of a template you should hopefully see a TextNode highlighted in the upper right panel. Those TextNode elements all have a chars attribute, and, as the name suggests, that contains the text characters in this element.

This means our goal is to apply the regular expression above to all of the TextNodes in the template, and specifically the chars attributes in those TextNodes.

As you can see in the example compiler plugin in the AST Explorer, it has a visitor thing again. In this case it is a regular property on a returned object but it works exactly the same as the previous visitors that we have written. So let's write another one:

module.exports = function() {
  return {
    name: 'ast-transform',

    visitor: {
      TextNode(node) {
        node.chars = node.chars.replace(/[ \r\n]+/g, ' ');
      }
    }
  };
};

If you put the snippet above in the lower left panel of the AST Explorer you can see how the output in the lower right panel changes, and is now only a single line of code, divided by space characters. Success!! 🎉

Alright, now it's your turn again. You may be using ember-test-selectors in your apps at work, but have you ever wondered how it works? In this exercise we will build a very basic version of it.

The goal is to remove all element attributes that start with data-test-.

Similar to the previous exercise I would recommend to do this exercise directly in the "Transform" mode of the AST Explorer. And if you feel stuck and not sure how to remove a node from the AST, have a look at the parent element and see if you can figure out a way using that element instead.

Solution

Let's first copy the code example from the slide into the AST Explorer:

<SomeComponent data-test-foo="bar" />

If you now click on data-test-foo you can see that it is part of an AttrNode element in the AST. We don't care that much about the value of this AttrNode, but we are very interested in the name.

It seems like we need to look for AttrNodes with a name that starts with data-test-. At this point I have to tell you that there are (at least) two ways of solving this, and I will show you both of them.

If you managed to solve the previous exercises you probably know by now how to write a visitor that looks for AttrNodes, but how do you delete an AST node once you've found it? In some cases we can return null; from the visitor callback function, and that will cause the traverse() function to remove the AST node. For your future endeavours it might also be good to know that you can not only return null but also completely new AST nodes, in which case the current node is replaced with the new node.

Back to our task, the first solution looks like this:

module.exports = function() {
  return {
    name: 'ast-transform',

    visitor: {
      AttrNode(node) {
        if (node.name.startsWith('data-test-')) {
          return null;
        }
      }
    }
  };
};

If you didn't know about this return null; thing before then it can be hard to find, so here is another solution that uses a different approach:

module.exports = function() {
  return {
    name: 'ast-transform',

    visitor: {
      ElementNode(node) {
        node.attributes = node.attributes
          .filter(it => !it.name.startsWith('data-test-'));
      }
    }
  };
};

In this case we look for the ElementNodes in the AST and we filter out all of the attributes contents (AttrNodes) where the name starts with data-test-, before assigning the filtered list to attributes again.

Both of these solutions work equally well, and the most important part when writing such compiler plugins is testing, to make sure that with a new compiler version it doesn't just suddenly break and stop working without you noticing.

Optional Extra Exercise

Just like before, there is an optional extra exercise here that will give you an opportunity to figure out how to integrate such compiler plugins with the Ember CLI build process.

In the 05b-strip-test-selectors folder, if you open the ember-cli-build.js file, you can see at the bottom of the file that I've already prepared a rough skeleton of what an integration can look like. We're creating a regular class with a transform() method and that method gets the root node as the first argument. Inside that method we also have access to this.syntax which is roughly the same content as if you did require('@glimmer/syntax').

After that class definition we use app.registry.add() to register an htmlbars-ast-plugin, and with that, we have a compiler plugin integrated in the Ember CLI build pipeline.

The implementation of the plugin can be roughly similar to the previous exercise so I'll leave that up to you to figure out... 😉

With "Code Analysis" and "Compilation" done, you've probably already guessed what comes next... it's "Refactoring"!

In this chapter we'll be looking at how to write basic template codemods using ember-template-recast, and as usual, we'll start with an example exercise in which I'll guide you through all of the steps.

For this example we'll pretend that our app has a MenuItem component, but we're not happy about its API. It is using a @caption argument to determine what the caption should be, but we'd like it to be more flexible, so we've created a NewMenuItem component with a different API.

The goal of this example exercise is to write a codemod that automatically converts from component invocations like <MenuItem @caption="foo" /> to <NewMenuItem>foo</NewMenuItem>.

As I mentioned, this time we're going to use ember-template-recast instead of @glimmer/syntax. One small advantage that is immediately visible: there is a parse() function! So let's start with that, parsing the files into an AST.

You've probably already figured out that this exercise will be done in the folder 06-component-migration, and in particular in the migrate-components.js file.

You can see that we already have a ember-template-recast import at the top of the file, so as mentioned before we will start by parsing the template files in the for loop:

for (let templatePath of templatePaths) {
  // read the file content
  let template = fs.readFileSync(templatePath, 'utf8');

  // parse the file content into an AST
  let root = recast.parse(template);
}

If you console.log() the root variable, or if you look at its content with a debugger, you will see that the AST from ember-template-recast looks very familiar. That's because ember-template-recast internally uses @glimmer/syntax and the AST is based on the Glimmer AST too.

The magic happens within another function though: the print() function.

We can use print() to convert a syntax tree back into text/characters/bytes that we can write back into the template files:

for (let templatePath of templatePaths) {
  // read the file content
  let template = fs.readFileSync(templatePath, 'utf8');

  // parse the file content into an AST
  let root = recast.parse(template);

  // TODO modify the AST

  // convert the AST back into text
  let newTemplate = recast.print(root);

  // if necessary, write the changes back to the original file
  if (newTemplate !== template) {
    fs.writeFileSync(templatePath, newTemplate, 'utf8')
  }
}

If you run the script in the current form, nothing should change because ember-template-recast is aware that you haven't modified the original AST and thus will return the original source text.

Time to modify the AST!

First, we'll start by adjusting the tag name from MenuItem to NewMenuItem. Just like @glimmer/syntax ember-template-recast also has a traverse() function, that we can use to find all the ElementNodes in the template and then adjust them:

recast.traverse(root, {
  ElementNode(node) {
    // filter out non-MenuItem elements
    if (node.tag !== 'MenuItem') return;

    // change the tag name to `NewMenuItem`
    node.tag = 'NewMenuItem';
  },
});

If you run the script now, and then look at the changed files in git you can see that it already changed some of the file. Please note that each time you run the script you will have to reset those files to their original state before you can run the script another time. Otherwise the script won't find any MenuItem component invocations anymore.

Now that the tag name is adjusted we will also need to convert the @caption argument. But to convert it we first need to find it in the AST. We know from a previous exercise that arguments like @caption can be found in the attributes key of an ElementNode, so let's see if we can find() a matching node in there:

let captionAttr = node.attributes.find(it => it.name === '@caption');

If you look at the corresponding AttrNode in the AST Explorer you can see that the value in this case is a TextNode. The same type of node that we've already seen when we did the whitespace collapsing exercise. That means a TextNode is a valid node to be included in the children attribute of an ElementNode.

All of this leads to the conclusion that we can probably just take the existing TextNode from the value attribute of the AttrNode, and move it into the children array of the ElementNode... let's try that and see if it works:

// find the `@caption` attribute
let captionAttr = node.attributes.find(it => it.name === '@caption');
if (captionAttr) {
  // move the `@caption` value into the element's `children`
  node.children = [captionAttr.value];
}

Trying to runs this you will probably notice that we forgot something. We only added the caption to the children array, but we did not remove it from the list of attributes, so now we have it twice. 😅

Let's fix this! We already know two ways to remove an attribute from a component invocation now, so let's do it again:

// remove `@caption` attribute
node.attributes = node.attributes.filter(it => it.name !== '@caption');

And now we're finally at a point where running the codemod produces roughly the results that we were looking for... 🎉

Included below is the full solution snippet if you couldn't piece it together from the snippets above.

Full Solution

const fs = require('fs');
const globby = require('globby');
const recast = require('ember-template-recast');

// find all template files in the `app/` folder
let templatePaths = globby.sync('app/**/*.hbs', {
  cwd: __dirname,
  absolute: true,
});

for (let templatePath of templatePaths) {
  // read the file content
  let template = fs.readFileSync(templatePath, 'utf8');

  // parse the file content into an AST
  let root = recast.parse(template);

  // use `traverse()` to "visit" all of the nodes in the AST
  recast.traverse(root, {
    ElementNode(node) {
      // filter out non-MenuItem elements
      if (node.tag !== 'MenuItem') return;

      // change the tag name to `NewMenuItem`
      node.tag = 'NewMenuItem';

      // find the `@caption` attribute
      let captionAttr = node.attributes.find(it => it.name === '@caption');
      if (captionAttr) {
        // move the `@caption` value into the element's `children`
        node.children = [captionAttr.value];
      }

      // remove `@caption` attribute
      node.attributes = node.attributes.filter(it => it.name !== '@caption');
    }
  });

  // convert the AST back into text
  let newTemplate = recast.print(root);

  // if necessary, write the changes back to the original file
  if (newTemplate !== template) {
    fs.writeFileSync(templatePath, newTemplate, 'utf8')
  }
}

What follows now is probably the most advanced exercise of the workshop, so don't feel bad if it takes you a while to find a good solution or if you're stuck and you can't find one at all. It's absolutely fine to give up and look at the solution in this case, but make sure to read through it and try to understand it.

Also, as a small piece of warning, ember-template-recast is still relatively new and in some cases can produce unintuitive or wrong results. If you hit one of those cases try to see if you can achieve the desired results in a different way and report this issue to the ember-template-recast bug tracker.

Alright, with all of that out of the way, let's start with this exercise!

You hopefully still remember our issue with unless conditions and else blocks from before, right?

In this exercise we will try to build a codemod that automatically fixes those issues, by converting the unless condition into an if condition, and swapping the contents of the condition blocks.

The folder for this exercise is 07-fix-unless-else and in it you will find an example app, and a fix-unless-else.js script, that looks basically the same as the one for our previous example exercise.

Good luck!

Solution

As in the previous exercise about unless/else we'll first start by writing a visitor and finding the relevant unless conditions, that have an inverse block:

recast.traverse(root, {
  BlockStatement(node) {
    if (
      // first we make sure that `node.path` is really a `PathExpression`
      // since there are a few edge cases where it might not be
      node.path.type === 'PathExpression' &&

      // then we check if this is an `unless` block
      node.path.original === 'unless' &&

      // and finally, we check if the block has an `else` block too
      node.inverse
    ) {
      // TODO transform this node
    }
  }
});

Next, we'll change the unless to an if by replacing the path of the BlockStatement:

node.path.original = 'if';

And now comes the more complicated part, swapping the block contents. Let's try this somewhat intuitive approach first:

let program = node.program;
let inverse = node.inverse;

node.program = inverse;
node.inverse = program;

If you run this, you will notice that it almost worked... but not quite. Instead of swapping the contents, the content of the first block is now used for both blocks... 🤔

This is what I meant when I mentioned earlier that ember-template-recast still has a few issues. But luckily there is a workaround: Instead of swapping the program and inverse blocks directly, we will only swap their body arrays:

let programBody = node.program.body;
let inverseBody = node.inverse.body;

node.program.body = inverseBody;
node.inverse.body = programBody;

Remember, before you run the codemod again, reset the example app files to their original state! And make sure to not reset the codemod script too, because otherwise you will lose your code changes! 😱

What follows is the full solution to this exercise:

const fs = require('fs');
const globby = require('globby');
const recast = require('ember-template-recast');

// find all template files in the `app/` folder
let templatePaths = globby.sync('app/**/*.hbs', {
  cwd: __dirname,
  absolute: true,
});

for (let templatePath of templatePaths) {
  // read the file content
  let template = fs.readFileSync(templatePath, 'utf8');

  // parse the file content into an AST
  let root = recast.parse(template);

  // use `traverse()` to "visit" all of the nodes in the AST
  recast.traverse(root, {
    BlockStatement(node) {
      if (
        // first we make sure that `node.path` is really a `PathExpression`
        // since there are a few edge cases where it might not be
        node.path.type === 'PathExpression' &&

        // then we check if this is an `unless` block
        node.path.original === 'unless' &&

        // and finally, we check if the block has an `else` block too
        node.inverse
      ) {
        let { program, inverse } = node;
        let programBody = program.body;
        let inverseBody = inverse.body;

        // swap `program` and `inverse` blocks
        node.program.body = inverseBody;
        node.inverse.body = programBody;

        // change the block statement from `unless` to `if`
        node.path.original = 'if';
      }
    }
  });

  // convert the AST back into text
  let newTemplate = recast.print(root);

  // if necessary, write the changes back to the original file
  if (newTemplate !== template) {
    fs.writeFileSync(templatePath, newTemplate, 'utf8')
  }
}

And with this part done we will leave the Handlebars ecosystem and have a quick look at some JavaScript-related tools! ✨

In particular, we will write two custom rules for ESLint in the following exercises.

Let's jump right in!

console.log() is nice as a debugging tool, but in Ember.js apps, most of the time if there is a console.log() in the code it is a sign that a developer forgot to remove some of that debugging code... 😉

ESLint has a no-console rule that we can use to warn about that, but our goal here is to learn how to write such rules, so we'll ignore that it exists for now.

Unsurprisingly, this example exercise will be done in the 08-no-console-log folder. In the .eslintrc.js file you can see that I've already configured a custom-no-console-log rule. But where does the code for that rule come from?

If you look in the package.json file, there is a lint:js npm script defined there, and it is using the --rulesdir option of ESLint. This option lets you specify a path from which additional rule definitions will be loaded. In this case we will put our custom ESLint rules in the lib/eslint-rules folder.

Let's open the lib/eslint-rules/custom-no-console-log.js file and see what we have in there...

module.exports = {
  create: function(context) {
    // TODO write your implementation here
  }
};

Currently, this file only exports a JavaScript object, with a single method on it, which is called create(). What is that method supposed to do?

As with the Handlebars AST a lot of other tools, including ESLint, also use the "Visitor" concept. ESLint expects custom rules to return a visitor object from the create() method, so let's do that:

return {
  // wait... what do we put here?
};

Before we can continue we need to figure out how an AST for a JavaScript file looks like! Luckily, AST Explorer can help us a lot again.

Let's switch the AST Explorer from Handlebars to the JavaScript language, and then we're going to enable the "ESLint v4" transform from the menu bar.

Transform? Oh, yes, we can also write ESLint rules directly in the AST Explorer!

Whether you prefer to write the code for this and the next exercise directly in the AST Explorer, or if you want to stay in your editor is up to you. Both ways work fine and basically the same way.

Let's continue to figure out what console.log() actually looks like in the ESLint AST. For that, we will type console.log() in the top left panel and then click on it to highlight it in the AST panel on the top right corner of the screen.

If you did that, you will probably have landed on an Identifier AST node, either console, or log. Their shared parent element is a MemberExpression, which corresponds to the console.log characters in the file. And if we go up another level we reach a CallExpression with a callee (the MemberExpression), and an arguments list, which is probably empty unless you typed something like console.log('foo').

Okaaaay, that is a lot of new stuff. But also a bit of familiar stuff. The AST has different nodes, those nodes have types, and depending on those types the nodes also have additional other attributes.

Let's recap what we're looking for: we want to warn about a CallExpression, with a callee that is a MemberExpression, that has an object that is an Identifier with name console, and a property which is also an Identifier but with the name log.

Alright, let's start with the CallExpression, and for now we'll just warn about any CallExpression:

return {
  CallExpression(node) {
    context.report({
      node,
      message: 'Unexpected console.log() expression',
    });
  }
};

If you try this in the AST Explorer you will notice in the bottom right corner that ESLint starts to print out warnings as comments:

// Unexpected console.log() expression (at 8:1)
   console.log();
// ^

If you prefer to stay in the editor then you can run yarn -s lint:js to run ESLint and you should also see quite a few warnings now.

Now the next step is to reduce the false positive warnings, until we only warn about the real console.log() statements. First, we'll filter out CallExpressions that are not MemberExpressions, to avoid warning about calls like alert('hello world!'):

let { callee } = node;
if (callee.type !== 'MemberExpression') return;

Next, we need to check the object and property of the MemberExpression:

let { object, property } = callee;
if (object.type !== 'Identifier' || object.name !== 'console') return;
if (property.type !== 'Identifier' || property.name !== 'log') return;

and... that's it!

module.exports = {
  create: function(context) {
    return {
      CallExpression(node) {
        let { callee } = node;
        if (callee.type !== 'MemberExpression') return;

        let { object, property } = callee;
        if (object.type !== 'Identifier' || object.name !== 'console') return;
        if (property.type !== 'Identifier' || property.name !== 'log') return;

        context.report({
          node,
          message: 'Unexpected console.log() expression',
        });
      }
    };
  }
};

This is the full implementation that is needed to build a basic ESLint rule that warns about console.log() usage.

If you want, you can compare this to the real-world no-console rule in ESLint, which is quite a bit more sophisticated and covers a few more edge cases, but for us beginners this is quite sufficient for now!

We're almost done. This is the last exercise that I've prepared for this workshop!

For this last one we will write an Ember-specific ESLint rule that warns about unnecessary service injection arguments.

What is an "unnecessary service injection argument"?

Well, if you inject a service like in the following example:

Component.extend({
  search: service('search'),
})

The 'search' argument is actually unnecessary, because it matches the search key, and the service() function will automatically default to the property key if no argument is given to it.

Anyway, the goal is to write an ESLint rule to catch this pattern and warn about it.

In the 09-no-unnecessary-injection-argument folder you can, once again, find a lib/eslint-rules subfolder, with a no-unnecessary-injection-argument.js file in it. The .eslintrc.js file is already adjusted to load the correct rule so either you can again use yarn -s lint:js, or you can develop the rule in the AST Explorer again.

Finally, before we start, the ESLint team has created some great docs on how to write custom ESLint rules, so if you get stuck or want to deepen your knowledge after the workshop, you should visit https://eslint.org/docs/developer-guide/working-with-rules.

Solution

You may be tempted to start with the CallExpression that represents service('search'), but in this case it will probably be easier to look at the parent node, which is a Property.

The Property has a key, which is an Identifier with then name search, and it has a value, which is the CallExpression above.

Let's also take a closer look at that CallExpression. The callee is also an Identifier, but with the name service, and we have one element in the arguments list of the CallExpression: a Literal with a value of search.

If we'll try to recap these conditions it will be a pretty long sentence so let's tackle this issue step-by-step again. We'll start by warning about all Property nodes:

return {
  Property(node) {
    context.report({
      node,
      message: 'Unnecessary injection argument',
    });
  }
};

Next, we'll check if the key and value children have to correct node type:

let { key, value } = node;
if (key.type !== 'Identifier') return;
if (value.type !== 'CallExpression') return;

If that is the case, then we should have a closer look at the value node. Let's check that it is an Identifier and has a name of either service or inject:

let { callee } = value;
if (callee.type !== 'Identifier') return;
if (!['inject', 'service'].includes(callee.name)) return;

In a real-world rule it would make sense to also check where service is defined and if it was indeed imported from @ember/service, but for our purposes this is good enough for now!

That is not all we need to check from the value node though, we also need to take a look at the arguments, and specifically the first one:

let arg = value.arguments[0];
if (!arg) return;
if (arg.type !== 'Literal') return;
if (arg.value !== key.name) return;

Here, we take the first argument, check if it exists, check if it is a Literal, and then check if the Literal value matches the name of the property key.

If all of that matches we can finally instruct ESLint to warn about this issue:

context.report({
  node: arg,
  message: 'Unnecessary injection argument',
});

... aaaaand that's it!

Congratulations, you are now "certified" abstract syntax forestry workers!

Please keep in mind that this workshop only covered the basics and that there is a lot more material out there to explore. I explicitly focused on Ember templates a lot, because it is way easier to find documentation and blog posts about ESLint and some of the other more popular tools.

If this workshop got you interested in learning more about these topic or if you would like to contribute to the wider Ember.js ecosystem, I welcome you to take a look at the eslint-plugin-ember and ember-template-lint projects on GitHub. Other than that there are also the #topic-codemods and #e-template-lint channels on the Ember.js community Discord server.

Thank you for participating and I hope you still enjoyed this somewhat unplanned digital version of the workshop!

ast-workshop
ast-workshop copied to clipboard

Metadata

Abstract Syntax Forestry

← Metadata

Owner

Metadata

ast-workshop ast-workshop copied to clipboard

Metadata

Abstract Syntax Forestry

← Metadata

Owner

Metadata

ast-workshop
ast-workshop copied to clipboard