dataform icon indicating copy to clipboard operation
dataform copied to clipboard

"ReferenceError: ref is not defined" when using ref function in SQLX config block

Open quassy opened this issue 7 months ago • 5 comments

-- filtered_postcodes.sqlx
config {
  {
    type: "view",
    assertions: {
      uniqueKey: ['postcode'],
      rowConditions: [
        `${ref("check_postcode")}(postcode)`,
        `postcode IN (SELECT postcode FROM ${ref("known_postcodes")}`
      ],
    },
}

SELECT *,
FROM ${ref("customer_postcodes")}
WHERE sales > 10000

When trying to use the ref() function in the config block, for example to add a simple rowCondition assertion, and then running dataform compile I get a compilation error:

Compilation errors:
  definitions/filtered_postcodes.sqlx: ReferenceError: ref is not defined

I would expect I could use the references in the config block just as I can in the SQLX body, to reference custom functions (like check_postcode which checks for valid values) or tables (for example so I can check that “foreign keys” exist in the source table) for simple data tests.

(The reference should also add a dependency of the current file on the references in the config block, so in the example filtered_postcodes should not only depend on customer_postcodes but also check_postcode and imported_postcodes.)

quassy avatar May 22 '25 17:05 quassy

How check_postcode is defined? If it is a JS function there is a different syntax to use it, please follow https://cloud.google.com/dataform/docs/javascript-in-dataform

Ceridan avatar May 27 '25 09:05 Ceridan

-- check_postcode.sqlx
config {
  type: "operations",
  hasOutput: true
}

CREATE OR REPLACE FUNCTION ${self()} (postcode STRING)
RETURNS BOOL
AS (
    -- German postal codes are 5 digits
    REGEXP_CONTAINS(postcode, '^[0-9]{5}$')
);

It's a user-defined BigQuery function in SQL. This is of course a very simple example, we have much more complex check functions that we implement in SQL and want to reuse in various actions/models. The issue is in general with referencing SQL actions/models inside the config block and not only in the SQL part.

quassy avatar May 27 '25 10:05 quassy

Technically you could use functions for resolving dependencies in a config block like this

config {
  {
    type: "view",
    assertions: {
      uniqueKey: ['postcode'],
      rowConditions: [
        dataform.resolve("check_postcode") + `(postcode)`,
        `postcode IN (SELECT postcode FROM ${ref("known_postcodes")}`
      ],
    },
}

But javascript in the config block is executed before actions in the graph are registered, so you won't be able to resolve other dependencies there.

kolina avatar May 31 '25 11:05 kolina

So I additionally need to manually declare the dependency with dependencies: ['check_postcode'] and it should work right?

quassy avatar Jun 03 '25 07:06 quassy

Addition: It's also not possible to use self() in the config block. So if you want to check the latest results of an incremental model that's not easily possible.

config {
  type: "incremental",
  description: "Tracking of row counts over time before and after transformation.",
  assertions: {
    rowConditions: [
      `updated_at != (SELECT MAX(updated_at) FROM ${self()}) OR count_before = count_after`,
   ]
  }
}

quassy avatar Jun 11 '25 13:06 quassy