ecto
ecto copied to clipboard
Proposal: Derived fields
I was playing around with the idea of introducing derived fields into schemas. I understand this has been proposed before but I wasn't able to see it proposed in exactly this way.
The Gist
A derived field is composed of one or more fields in the same schema (or none at all!). They can be composed the same way a normal select expression is. This could be useful in several situations:
- Wanting the same database column more than once in your schema. There was a recent issue about this.
- The same as above but possibly wanting the same column with different types
- Wanting to compose multiple columns into one field, e.g. first name + last name into full name
- Wanting to transform an existing column, maybe using
coalesce
or something like that to remove nulls.
The Proposal
- allow an additional option onto
field/2
macro calledderived_as
. This takes an MFA that must return a dynamic expression with a single binding - the planner will catch these derived fields and expand the dynamic fragment. there are two to catch it:
- the prewalker (to catch stuff like
select: p.derived
) - collect_fields (to catch stuff like
select: p
andselect: [:derived]
- the prewalker (to catch stuff like
The PR is in a state where it can be reviewed to see if the idea/general direction are desirable. If it is there are some more things to do:
- improve documentation
- improve tests
- improve errors
- discuss how to handle
returning
for operations that don't use the planner (insert,update,delete). this part is the least straight forward
This looks interesting @greg-rychlewski! Let me ask you a question: could this implemented with :source
and custom types? The idea is to allow :source
to also be a list (let's say a non-empty list for now). If you have more than one source, then the dumped value and the value receive on load
are tuples.
However, my biggest concern in such a proposal is update and insert, especially if you have overlapping fields.
Oh I didn't think about it that way. I think something like that could work but would need another option to make it read only so it can be ignored by insert/update/delete like you said.
One other thing is I think it loses some of its power if the transformation is done in Elixir instead of the database. Some examples:
- you can no longer use the derived field outside of
select
. so for instance if you want to use this feature to join/filter on a normalized version of one of your fields you couldn't - some of the powerful abstractions the database can do will be lost. for example I think some of the functions of PostGIS will be hard for users to replicate in Elixir
- you will maybe send back many duplicate values over the wire instead of a single value
I see. I wonder if we can then extend Ecto.Type / Ecto.ParameterizedType somehow so they customize why they are selected? I would avoid adding a new abstraction. So if there is a way to marry with an existing feature, it would be perfect. :)
Do you think it would be strange to marry it with virtual
? It would have some advantages like being able to use existing types instead of making custom ones.
It seems to fit the definition in the docs:
:virtual - When true, the field is not persisted to the database.
Though in practice I think of them more as fields not entering the query. But if we could use virtual by passing it an mfa or something I think that would make the feature a bit easier to use.
I see. You are treating them as read_only
fields, so you are not concerned with insert/update/delete, while I was thinking you wanted to marry it with insert/update/delete at some point. I think that's a fair view but I would say it would be just a matter of time for someone to also want this same functionality to work for schema operations too (and not only query ones). If we say no, it may be seen as an arbitrary line we are drawing. :)
I was thinking we will for sure have to support it with returning
for insert/updates/deletes. But in terms of inserting the field or updating it I was seeing these as equivalent to calculated fields in SQL i.e. not real columns just expressions made up of other columns. I'm not sure if that makes it any better?
I was seeing these as equivalent to calculated fields in SQL
You are right but I think it is an arbitrarily line. For example, the Calecto library used to have two-three columns for storing timestamps. I would write those as a single json/jsonp field, but I can see someone saying "hey, why not mapping to two-three fields on load/dump".
I would also like to avoid adding query details to the schema, because it is easy to paint ourselves into a corner where we start adding a bunch of computed fields (bloating the schema) for something that is a read-only concern. Ecto has two answers for this:
-
Nesting:
select: %{post: post, computed_field: coalesce(...)}
-
Virtual fields themselves!
select: %{post | computed_field: coalesce(...)}
WDYT?
I see what you are saying now. I will have to think more about this. Ideally I'd like to have the advantages of this proposal: not having to repeat the same expression in multiple places + being able to use the calculated field outside of select, for instance on join
or where
, with the point you brought up about the storage.
Great work there! I like the concept, but the schema bloating concern is legit.
I suggest requiring :virtual
for :derived_as
. It doesn't look as pretty but makes the intentions clearer :)
I understand the bloat concern but I'm not sure it is more egregious than other things you can do, both in Ecto and the underlying database:
- in Ecto associations can have
:where
, you could define as many associations as you want for one relationship. - in the database you could define as many materialized views on your database as you want, causing updates to be very slow
- in Postgres at least you can define as many generated columns as you want (stored calculated fields)
It is a bit hard for me to say this feature crosses a line that other things don't.
To me the biggest concern is that we are adding a new orthogonal capability to fields, which was not there before. Associations were already special constructs in queries, so adding where was not that much of a stretch. That’s why tying this to parameterized types would be less problematic to me, as it is an extension rather than addition, if that makes any sense. :)
Makes complete sense. I'll try to see what I can come up with from the type angle.
This looks like a very useful feature that I could make good use of in Flop. There were previous considerations about adding derived fields to Flop, but it seemed a bit hacky to implement. Defining derived fields directly in the Ecto schema would make it much easier to cover certain use cases in Flop (e.g. filtering by a column cast to a certain type).
Just as an additional thought: Flop already makes use of selected_as
. If there was a variation of selected_as
that injects the actual term instead of referring to it by alias in the final SQL query, Flop could use it in where
clauses, which should cover all use cases from a Flop perspective. So basically (naming debatable):
from u in User,
select: %{full_name: derived_as(fragment("concat(?, ' ', ?)", u.given_name, u.family_name), :full_name},
where: ilike(derived_as(:full_name), ^search_term)
SELECT CONCAT(u.given_name, ' ', u.family_name) AS full_name
FROM users u
WHERE CONCAT(u.given_name, ' ', u.family_name) ILIKE ?;
This way, there is no bloated schema, but also, the definition needs be added to each query that uses it.
In your case, I think query macros would solve it:
defmacro full_name(e) do
quote do
fragment("concat(?, ' ', ?)", unquote(u).given_name, unquote(u).family_name)
end
end
In your case, I think query macros would solve it:
defmacro full_name(e) do quote do fragment("concat(?, ' ', ?)", unquote(u).given_name, unquote(u).family_name) end end
Sorry, my post didn't include all the context. Flop turns something like this:
%{"filters" => [%{"field" => "given_name", "op" => "==", "value" => "Ann"}]}
Into this (conceptually, not literally):
defp apply_filter(q, %Flop.Filter{field: field, op: :==, value: value})
where(q, [u], field(u, ^field) == ^value)
end
This is easy to accomplish with regular schema fields, and Flop also supports fields on named bindings, compound fields, selected_as aliases, and as an escape hatch, custom fields.
What it cannot do is to apply filter operators and values on arbitrary, user-defined fragments or expressions:
where(q, [u], fragment("? ->> ?", u.location, "city") == ^value)
The proposed feature for Flop would allow a user to define the partial expression similar to this (with an imaginary derived_field
macro):
@derive {
Flop.Schema,
derived_fields: [
city: Flop.derived_field(fragment("? ->> ?", :location, "city"))
]
}
Which would allow the user to pass filter parameters like this:
%{"filters" => [%{"field" => "city", "op" => "==", "value" => "Valdivia"}]}
# or
%{"filters" => [%{"field" => "city", "op" => ">=", "value" => "V"}]}
The user can of course define a city
macro similar to your example, but Flop would have to apply that macro dynamically depending on the filter parameters.