cobalt-ui Feature request: Add a preprocess step to the plugin API

I've been reading the Terazzo plugin API docs to figure out how to migrate some custom Cobalt plugins I made for our design system at work, Most of my plugins pre-process the design token data before it gets handed to a "normal" plugin to be output as CSS, JS, or whatever.

With Cobalt I used a hacky approach where you can pass one plugin into the config of my plugin. My plugin manipulates the parsed token data in some way and then runs the plugin specified in the config with that modfied token data. I suspect Cobalt's plugin API wasn't intended to be (ab)used in that way, but it worked :-)

I don't think Terrazo's new 2-step proces would let me do that. What I'm after isn't a transform() step to generate values to later be output. I want to modfiy the tokens data before it gets passed into transform().

Maybe there could be a preProcess() hook or something like that? It could take the same read-only tokens data and return a new tokens data that replaces it and is used later on.

Not quite sure how this would work with the AST stuff (I don't really have any experience of that), so I'm sure this proposal will require more thought and refinement. But hopefully, you get what I'm after.

Dec 20 '24 01:12 c1rrus

Yeah that makes sense! I’d be open to adding a preProcess hook or something. But just to make sure I’m understanding, can you give an example of what you start with, and how you preprocess it before the plugin gets it? I don’t mind giving people more control—the whole idea of having a pluggable system is being able to do what you want! But I also want to make things as easy as possible for folks, because others may be doing the same thing and there may be a shortcut we could take (maybe).

Sidenote: AST things were a necessary evil to show helpful errors in output—now it can point to the exact line and character a parsing error was encountered. In most cases you can just bypass it and work with JSON directly if you prefer—it’s all the same.

Dec 21 '24 05:12 drwpow

Thought about this a little more and I think this may be something that’s supported by the JS API, especially since you’re doing a lot of work in-memory. A major change in 2.0 was now 100% of the work happens in the parser, including plugin transformation (as opposed to 1.0, where most of it happened in the CLI for no reason at all; just poor planning). So you could do something like:

import { build, defineConfig, parse } from '@terrazzo/parser';
import fs from 'node:fs';

const cwd = new URL(import.meta.url);
const dist = new URL('./dist/', cwd);
const rawTokensPath = new URL('./my-tokens.json', cwd);
const rawTokens = JSON.parse(fs.readFileSync(rawTokensPath));
const transformedTokens = /* do some transforms */

const config = defineConfig({
  outDir: dist,
  plugins: [/* my plugins */],
}, { cwd });

const { tokens, sources } = await parse([{
  filename: rawTokensPath,
  src: transformedTokens, // This can be an object, as long as it’s serializable JSON
}], { config });
const result = await build({ tokens, sources }, { config });
for (const file of result.outputFiles) {
  fs.writeFileSync(new URL(file.filename, dist), file.contents), 'utf8');
}

It’s a little more verbose, but lets you control all the tokens that need parsing before any of the plugins hit them.

If this doesn’t fit your usecase, would love if you could walk me through it sometime and maybe we could figure out a good solution. The main hiccup is Terrazzo can display exact source line error messages, which is cool, but it means it is a little “locked-in” from the time it reads the files until the plugins first get them to preserve all those source mappings. I’m open to modifying that but want to be careful how I go about it.

Jan 30 '25 16:01 drwpow

Thanks for that example @drwpow! As it happens, I've ended doing something a bit like that, in preparation of migrating from Cobalt to Terazzo. I basically wrote a script that reads a "raw" DTCG file, does some manipulations to it, and then saves it out as new, "preprocessed" DTCG file. And that's the file I then feed into Cobalt.

My script was able to replace most of the custom plug-ins I had made (which, TBH, where always a bit ropey anyway), so now that I'm no longer doing anything exotic with Cobalt, moving to Terazzo should be faily easy.

That being said, my script ended up becoming a bit of beast, as I need to (partially) parse the DTCG data to do the stuff I need to do. I had been tinkering with some DTCG parsing code as a side project anyway, so I ended up using that in my script.

However, I think some kind of preProcess() hook could still be useful. Some of my use-cases might have been easier to achieve that way, as I could have leveraged Terrazo's DTCG parser rather than having to make my own. So, while I have a solution that works for me now, it's possible others might benefit from that kind functionality.

Use-case 1: Working around limitations / errors in source DTCG data

We use Tokens Brücke to export Figma variables as a DTCG file, to then be processed by Cobalt UI / Terazzo. It's a lovely tool, but does sometimes produce some not-quite-spec-compliant DTCG data. (in fairness, that's sometimes due to limitations in Figma, and sometimes perhaps due to gaps in the DTCG spec).

For example, it's mapping of Figma's variable types to DTCG types isn't always right in our specific case. We have some Figma variables for font family names (which are then used in typography stlyes). In Figma they're string variables, so Tokens Brücke exports those as tokens with "$type": "string".

However, we know - based on the token name - which of our "string" tokens are actually meant to be "fontFamily" ones. So, our script, identifies those tokens and updates their $type accordingly. That way, when Cobalt / Terrazo parses that data, it inteprets those tokens correctly.

There are a few other, similar fix-ups we do too.

Use-case 2: Removing "private" tokens when using multiple modes

In our design system, our design tokens are conceptually organised into primitive and semantic tiers (we don't use a component tier). For example, our primitives contain things like color ramps (e.g. color / blue / 500) and our semantic ones are references to those (e.g. color / background / default will reference some primitive color token).

We use Figma's variable modes to do things like light & dark color scheme, different sub-brands, responsive font sizes and more. In some cases, a semantic color token's resolved value depends on multiple modes - e.g. light or dark color scheme and the choice of sub-brand. Since in Figma each mode is tied to a single collection, the actual structure is a little more complex.

E.g. color / background / default might be in a collection that has light & dark modes. So its respecitve light & dark values are references to different variables in another collection that has brand-A and brand-B modes. Each of those variables then points to a variable in another collection that has an actual value:

color / background / default
 │
 ├─[light]─> color / brand / background / light
 │            │
 │            ├─[brand A]─> color / red / 100
 │            │
 │            └─[brand B]─> color / blue / 100
 │
 └─[dark]──> color / brand / background / dark
              │
              ├─[brand A]─> color / red / 900
              │
              └─[brand B]─> color / blue / 900

However, we intentionally only publish our semantic tier tokens. In Figma, all our primitives are hidden from publishing. In code we want to resolve any references to "private" tokens and then remove those private tokens from the final output. That becomes tricky in multi-mode scenarios like the above one though.

The initial DTCG (using the mode extension) export from Figma would look something like this:

{
  "color": {
    "background": {
      "default": {
        "$type": "color",
        "$value": "{color.brand.background.light}",
        "$extensions": {
          "mode": {
            "Light": "{color.brand.background.light}",
            "Dark": "{color.brand.background.dark}"
          }
        }
      }
    },
    "brand": {
      "background": {
        "light": {
          "$type": "color",
          "$value": "{color.red.100}",
          "$extensions": {
            "mode": {
              "Brand A": "{color.red.100}",
              "Brand B": "{color.blue.100}"
            }
          }
        },
        "dark": {
          "$type": "color",
          "$value": "{color.red.900}",
          "$extensions": {
            "mode": {
              "Brand A": "{color.red.900}",
              "Brand B": "{color.blue.900}"
            }
          }
        }
      }
    },
    "red": {
      "100": { "$type": "color", "$value": "..." },
      "900": { "$type": "color", "$value": "..." }
    },
    "blue": {
      "100": { "$type": "color", "$value": "..." },
      "900": { "$type": "color", "$value": "..." }
    }
  }
}

And we want to end up with only the color.background.default token with (new) modes for all 4 possible values it could resolve to:

{
  "color": {
    "background": {
      "default": {
        "$type": "color",
        "$value": "... (red.100's value)",
        "$extensions": {
          "mode": {
            "Light:Brand A": "... (red.100's value)",
            "Light:Brand B": "... (blue.100's value)",
            "Dark:Brand A": "... (red.900's value)",
            "Dark:Brand B": "... (blue.900's value)"
          }
        }
      }
    }
  }
}

...and that's what we then feed into Cobalt to generate the corresponding CSS / TS / etc. code.

Jan 31 '25 14:01 c1rrus

Oh! That makes a lot of sense, thank you. Yeah you’re right this is tricky—you do have the source of truth in one sense (Figma). And bugs/weirdness aside in TokensBrücke which is a separate issue, I can see how it’s desirable to not store a long-lived competing source of truth where you modify a few things. But you don’t want to necessarily build your own DTCG parser just to generate tokens to feed it into another DTCG parser (but I love hearing that you are building your own parser! That’s a great exercise).

I’m on board with the preprocess step now; I can see this being useful for folks. Thanks for providing a clear example.

As an aside one of the biggest learnings was coming from JS bundlers was realizing there you have many inputs and many outputs, but with tokens you have one input, many outputs. The “preprocessor” step in JS bundlers isn’t usually needed because with so many inputs you just combine it with the transform step, but in our case I’m seeing how the “one input” key difference is also reflected here.

Jan 31 '25 16:01 drwpow

I just want to say I've had the same two use cases that James has shared. I've been needing to map token name patterns to token types (which is SD is the typical use case for matchers), and to pre-filter tokens also to remove private tokens. On the latter, it's especially common a pattern when you do multi-mode because you end up having collections for responsiveness modes and others for color modes / themes.

With my current client, on top of that, we have shared token sets in Tokens Studio, and then product-specific ones. We always use the shared tokens, resolve them and then override with the per-product tokens. This matches how Tokens Studio lets us override tokens and ensures that the token processor doesn't wrongly report name collisions in our token sets.

Feb 01 '25 15:02 Sidnioulz

Added 🙂. Docs here: https://terrazzo.app/docs/cli/api/js/#transform-api. Would love feedback on the API! Even though it’s “released,” I more did so just for evaluation. Happy to make breaking changes to this API to improve it.

I tried to strike a balance between providing utility with the visitors (e.g. maybe you want to operate only on dimension tokens, but they don’t all manually redeclare $type: "dimension") with also letting you just work with JSON, and do silly things if needed (you can dynamically inject or delete tokens from groups). It should provide a lot of power but without making you do manual JSON drugery.

The current tests have some good examples of what it’s capable of: https://github.com/terrazzoapp/terrazzo/blob/main/packages/parser/test/parse.test.ts#L3116

Closing just as a “this exists” but please provide feedback in new threads if you try this API out and want to change something, just to keep discussion focused 🙏

Jun 02 '25 09:06 drwpow