ecto icon indicating copy to clipboard operation
ecto copied to clipboard

Proposal: Derived fields

Open greg-rychlewski opened this issue 1 year ago • 15 comments

I was playing around with the idea of introducing derived fields into schemas. I understand this has been proposed before but I wasn't able to see it proposed in exactly this way.

The Gist

A derived field is composed of one or more fields in the same schema (or none at all!). They can be composed the same way a normal select expression is. This could be useful in several situations:

  • Wanting the same database column more than once in your schema. There was a recent issue about this.
  • The same as above but possibly wanting the same column with different types
  • Wanting to compose multiple columns into one field, e.g. first name + last name into full name
  • Wanting to transform an existing column, maybe using coalesce or something like that to remove nulls.

The Proposal

  • allow an additional option onto field/2 macro called derived_as. This takes an MFA that must return a dynamic expression with a single binding
  • the planner will catch these derived fields and expand the dynamic fragment. there are two to catch it:
    • the prewalker (to catch stuff like select: p.derived)
    • collect_fields (to catch stuff like select: p and select: [:derived]

The PR is in a state where it can be reviewed to see if the idea/general direction are desirable. If it is there are some more things to do:

  • improve documentation
  • improve tests
  • improve errors
  • discuss how to handle returning for operations that don't use the planner (insert,update,delete). this part is the least straight forward

greg-rychlewski avatar Aug 13 '23 07:08 greg-rychlewski

This looks interesting @greg-rychlewski! Let me ask you a question: could this implemented with :source and custom types? The idea is to allow :source to also be a list (let's say a non-empty list for now). If you have more than one source, then the dumped value and the value receive on load are tuples.

However, my biggest concern in such a proposal is update and insert, especially if you have overlapping fields.

josevalim avatar Aug 13 '23 08:08 josevalim

Oh I didn't think about it that way. I think something like that could work but would need another option to make it read only so it can be ignored by insert/update/delete like you said.

One other thing is I think it loses some of its power if the transformation is done in Elixir instead of the database. Some examples:

  • you can no longer use the derived field outside of select. so for instance if you want to use this feature to join/filter on a normalized version of one of your fields you couldn't
  • some of the powerful abstractions the database can do will be lost. for example I think some of the functions of PostGIS will be hard for users to replicate in Elixir
  • you will maybe send back many duplicate values over the wire instead of a single value

greg-rychlewski avatar Aug 13 '23 13:08 greg-rychlewski

I see. I wonder if we can then extend Ecto.Type / Ecto.ParameterizedType somehow so they customize why they are selected? I would avoid adding a new abstraction. So if there is a way to marry with an existing feature, it would be perfect. :)

josevalim avatar Aug 13 '23 16:08 josevalim

Do you think it would be strange to marry it with virtual? It would have some advantages like being able to use existing types instead of making custom ones.

It seems to fit the definition in the docs:

:virtual - When true, the field is not persisted to the database.

Though in practice I think of them more as fields not entering the query. But if we could use virtual by passing it an mfa or something I think that would make the feature a bit easier to use.

greg-rychlewski avatar Aug 13 '23 17:08 greg-rychlewski

I see. You are treating them as read_only fields, so you are not concerned with insert/update/delete, while I was thinking you wanted to marry it with insert/update/delete at some point. I think that's a fair view but I would say it would be just a matter of time for someone to also want this same functionality to work for schema operations too (and not only query ones). If we say no, it may be seen as an arbitrary line we are drawing. :)

josevalim avatar Aug 13 '23 18:08 josevalim

I was thinking we will for sure have to support it with returning for insert/updates/deletes. But in terms of inserting the field or updating it I was seeing these as equivalent to calculated fields in SQL i.e. not real columns just expressions made up of other columns. I'm not sure if that makes it any better?

greg-rychlewski avatar Aug 13 '23 22:08 greg-rychlewski

I was seeing these as equivalent to calculated fields in SQL

You are right but I think it is an arbitrarily line. For example, the Calecto library used to have two-three columns for storing timestamps. I would write those as a single json/jsonp field, but I can see someone saying "hey, why not mapping to two-three fields on load/dump".

I would also like to avoid adding query details to the schema, because it is easy to paint ourselves into a corner where we start adding a bunch of computed fields (bloating the schema) for something that is a read-only concern. Ecto has two answers for this:

  1. Nesting: select: %{post: post, computed_field: coalesce(...)}

  2. Virtual fields themselves! select: %{post | computed_field: coalesce(...)}

WDYT?

josevalim avatar Aug 14 '23 11:08 josevalim

I see what you are saying now. I will have to think more about this. Ideally I'd like to have the advantages of this proposal: not having to repeat the same expression in multiple places + being able to use the calculated field outside of select, for instance on join or where, with the point you brought up about the storage.

greg-rychlewski avatar Aug 14 '23 14:08 greg-rychlewski

Great work there! I like the concept, but the schema bloating concern is legit.

I suggest requiring :virtual for :derived_as. It doesn't look as pretty but makes the intentions clearer :)

v0idpwn avatar Aug 14 '23 15:08 v0idpwn

I understand the bloat concern but I'm not sure it is more egregious than other things you can do, both in Ecto and the underlying database:

  • in Ecto associations can have :where, you could define as many associations as you want for one relationship.
  • in the database you could define as many materialized views on your database as you want, causing updates to be very slow
  • in Postgres at least you can define as many generated columns as you want (stored calculated fields)

It is a bit hard for me to say this feature crosses a line that other things don't.

greg-rychlewski avatar Aug 14 '23 15:08 greg-rychlewski

To me the biggest concern is that we are adding a new orthogonal capability to fields, which was not there before. Associations were already special constructs in queries, so adding where was not that much of a stretch. That’s why tying this to parameterized types would be less problematic to me, as it is an extension rather than addition, if that makes any sense. :)

josevalim avatar Aug 14 '23 15:08 josevalim

Makes complete sense. I'll try to see what I can come up with from the type angle.

greg-rychlewski avatar Aug 14 '23 16:08 greg-rychlewski

This looks like a very useful feature that I could make good use of in Flop. There were previous considerations about adding derived fields to Flop, but it seemed a bit hacky to implement. Defining derived fields directly in the Ecto schema would make it much easier to cover certain use cases in Flop (e.g. filtering by a column cast to a certain type).

Just as an additional thought: Flop already makes use of selected_as. If there was a variation of selected_as that injects the actual term instead of referring to it by alias in the final SQL query, Flop could use it in where clauses, which should cover all use cases from a Flop perspective. So basically (naming debatable):

from u in User,
  select: %{full_name: derived_as(fragment("concat(?, ' ', ?)", u.given_name, u.family_name), :full_name},
  where: ilike(derived_as(:full_name), ^search_term)
SELECT CONCAT(u.given_name, ' ', u.family_name) AS full_name
FROM users u
WHERE CONCAT(u.given_name, ' ', u.family_name) ILIKE ?;

This way, there is no bloated schema, but also, the definition needs be added to each query that uses it.

woylie avatar Sep 28 '23 15:09 woylie

In your case, I think query macros would solve it:

defmacro full_name(e) do
  quote do
    fragment("concat(?, ' ', ?)", unquote(u).given_name, unquote(u).family_name)
  end
end

josevalim avatar Nov 14 '23 09:11 josevalim

In your case, I think query macros would solve it:

defmacro full_name(e) do
  quote do
    fragment("concat(?, ' ', ?)", unquote(u).given_name, unquote(u).family_name)
  end
end

Sorry, my post didn't include all the context. Flop turns something like this:

%{"filters" => [%{"field" => "given_name", "op" => "==", "value" => "Ann"}]}

Into this (conceptually, not literally):

defp apply_filter(q, %Flop.Filter{field: field, op: :==, value: value})
  where(q, [u], field(u, ^field) == ^value)
end

This is easy to accomplish with regular schema fields, and Flop also supports fields on named bindings, compound fields, selected_as aliases, and as an escape hatch, custom fields.

What it cannot do is to apply filter operators and values on arbitrary, user-defined fragments or expressions:

where(q, [u], fragment("? ->> ?", u.location, "city") == ^value)

The proposed feature for Flop would allow a user to define the partial expression similar to this (with an imaginary derived_field macro):

@derive {
  Flop.Schema,
  derived_fields: [
    city: Flop.derived_field(fragment("? ->> ?", :location, "city"))
  ]
}

Which would allow the user to pass filter parameters like this:

%{"filters" => [%{"field" => "city", "op" => "==", "value" => "Valdivia"}]}

# or

%{"filters" => [%{"field" => "city", "op" => ">=", "value" => "V"}]}

The user can of course define a city macro similar to your example, but Flop would have to apply that macro dynamically depending on the filter parameters.

woylie avatar Nov 16 '23 05:11 woylie