enso
enso copied to clipboard
Add offset_value method and support to Enso
As a user, I want to be able to perform operations that calculate across multiple rows. This enables a wide variety of use cases, such as data cleanup and advanced running calculations.
The offset
method offers a standard method for performing these types of calculations. Databases such as sql server offer methods like lead, lag, and over by, but these are often difficult for new users to understand.
There are two options for this:
Option 1:
By creating offset_value
as a method for referencing a prior value via the set
method, we can enable powerful functional capability directly within set
without needing to create a more complex design.
The proposed API is as follows:
offset_value : Column -> Integer -> Group_By -> Sort -> If_Missing offset_value column rows_offset group_by=Nothing sort=Nothing if_missing=Nothing
This method will optionally take a grouping argument group_by. It will use the existing order of the table otherwise.
The rows_offset
value can be positive or negative and is the relative index reference of the offset value being retrieved. For example, on index 10, a rows_offset
of -1 will return the value of the column at index 9, and a rows_offset
of 1 will return the value of the column at index 11.
Column
allows a dropdown selection of the column that an offset value is being retrieved from.
if_missing
allows a set of options for default behavior if no value is returned, for example if there is no value at the referenced location in the table, such as referencing index-1 from index 0. It has three options:
Null 0 or Empty - if the field is a number, returns 0, if the field is a string returns an empty value Closest value - returns the closest applicable value from the table
rows_offset
will return the value from the other specified row for use in a formula inside of set
Option 2
Alternatively, we can create offset_value
as a standalone function which returns a new column with the offset values. This is a simpler approach technically, and allows a user to leverage the existing formula functions by referencing the desired offsets. It can support sorting and grouping. The primary downside to this approach is that while it is simple, it is verbose, requiring many steps (IE, to check row-1:value and row-2:value, you need to create two offset_value
nodes, then use set
to define the value, and then remove_columns
to get rid of the extraneous values.
This api would look something like this: offset_value : Column -> Integer -> Group_By -> Sort -> Text -> If_Missing offset_value column rows_offset group_by=Nothing sort=Nothing new_name=Nothing if_missing=Nothing
This method optionally takes a grouping argument, a sort argument, and a new_name argument. If these are Nothing, the grouping and sort would take incoming record order, and the new_name would be automatically assigned as "offset_(rows_offset)(column)", IE a column "value" with a rows_offset of -1 would be automatically named to "offset-1_value"
Per meeting 2024-06-28 - Chose to go forward with offset_value as a Table function and a Column function, as well as adding support for expression editor.