ibis feat: add `first_value` and `last_value` metrics with and "ignore nulls" option for use in a window

Is your feature request related to a problem?

I would like to implement an efficient, pandas-compatible bfill and ffill in BigQuery DataFrames without custom operators.

What is the motivation behind your request?

To implement bfill and ffill in BigQuery DataFrames (see: https://github.com/ibis-project/ibis/issues/3652 for an issue requesting bfill and ffill in ibis), we add two new operations to Ibis: FirstNonNullValue and LastNonNullValue.

This is then used like FirstNonNullValue(column).over(window), which when the window is ibis.range_window(preceding=0, following=limit), implements a pandas-like bfill with limit parameter support.

Likewise, use LastNonNullValue(column).over(window) with a window of ibis.range_window(preceding=limit, following=0) to implement a pandas-like ffill with limit parameter support.

Describe the solution you'd like

I propose the following two new "metrics", which require a window to operate:

value.first(ignore_nulls=True).over(window)
value.last(ignore_nulls=True).over(window)

Related: IGNORE NULLS in general is tracked here: https://github.com/ibis-project/ibis/issues/7649

Previously, with string-based compilation, this was accomplished with a custom BigQuery operator like:

https://github.com/googleapis/python-bigquery-dataframes/blob/74170dabd323f1b08ad76241e37ff9f2a5b67ab5/third_party/bigframes_vendored/ibis/backends/bigquery/registry.py#L15-L17

In SQLGlot, it does appear there is a First expression with at least some IGNORE NULLS support. My work is in-progress to see if this operator is compatible with the BigQuery dialect.

What version of ibis are you running?

8.x, but working on an upgrade to 9.x right now.

What backend(s) are you using, if any?

BigQuery

Code of Conduct

[X] I agree to follow this project's Code of Conduct

Jul 08 '24 15:07 tswast

In SQLGlot, it does appear there is a First expression with at least some IGNORE NULLS support. My work is in-progress to see if this operator is compatible with the BigQuery dialect.

I see also FirstValue which according to the comment is specifically for windowed contexts. Neither is parametrized, so I'm not sure about IGNORE NULLS support yet.

Jul 08 '24 16:07 tswast

Filed https://github.com/tobymao/sqlglot/issues/3749 as I am having trouble figuring out how to create the SQLGlot expression. Other examples I found in the compiler use self.agg.avg(...), but I don't think that would account for the IGNORE NULLS part.

Jul 08 '24 21:07 tswast

Update:

    def visit_FirstNonNullValue(self, op, *, arg):
        return sge.IgnoreNulls(this=sge.FirstValue(this=arg))

This works for the bfill implementation in BigQuery DataFrames. I imagine it should work in most SQLGlot backends. Probably makes sense to make IgnoreNulls a parameter in ibis.

Jul 09 '24 15:07 tswast

Most of the team is at SciPy through the end of this week. This makes sense to me, but would love to get thoughts from @cpcloud and the other maintainers.

Jul 12 '24 18:07 deepyaman

We should definitely add this functionality, but likely as a flag on the ops that support, which is more than just first/last, so that we don't end up with a set of classes duplicated only to allow them to set ignore_nulls.

@tswast Any chance someone from the BigFrames team wants to take a crack at it?

Jul 17 '24 17:07 cpcloud

Note that recently we made first/last/collect uniformly ignore nulls in non-window expressions (previously it was backend-dependent). Another option is to make this also true when used in window expressions, which would both improve uniformity and avoid the need for a new flag.

Jul 17 '24 18:07 jcrist

🤔 Perhaps we only expose ignore_nulls on lag/lead which is what an implementation of bfill/ffill would be based on.

Jul 17 '24 18:07 cpcloud

This feature would be useful to Ibis support in Narwhals

Note that for other backends (e.g. duckdb / pyspark) we don't use lag / lead, but last_value / first_value

May 02 '25 17:05 MarcoGorelli

Here's how we implement ffill using last_value - note that ignore_nulls is required for this to work correctly:

import duckdb
import narwhals as nw
import sqlglot

conn = duckdb.connect()
conn.sql("""CREATE TABLE prices (date DATE, price DOUBLE);""")

df = (
    nw.from_native(conn.table("prices"))
    .with_columns(price_filled=nw.col('price').fill_null(strategy='forward').over(order_by='date'))
)
print(sqlglot.transpile(df.to_native().sql_query(), pretty=True)[0])

SELECT
  date,
  price,
  LAST_VALUE(price) IGNORE NULLS OVER (ORDER BY date ASC ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS price_filled
FROM main.prices

Jun 17 '25 12:06 MarcoGorelli