ecto-cursor-based-stream
ecto-cursor-based-stream copied to clipboard
Supporting multiple columns + arbitrary selects
I've recently had to build something quite similar to this (without knowing this library existed) that needs to support two things:
- Multiple column cursor fields
- Arbitrary selects in the query
I think #3 covers the first point, but it isn't as simple as it immediately seems because automating the following is difficult:
- Imagine you have
%MySchema{id_1: ..., id_2: ..., id_3: ...} - Our
cursor_fieldsthus need to be[:id_1, :id_2, :id_3] - When we're fetching data we need to do:
where: (x.id_1, x.id_2, x.id_3) > (^max_cursor.id_1, ^max_cursor.id_2, ^max_cursor.id_3) - You can construct a dynamic tuple for the above comparison quite trivially via
fragment("(?)", splice(^cursor_fields)) - HOWEVER I don't think there's a mechanism to dynamically generate arbitrary tuples of fields belonging to bindings... we resorted to hackily dynamically generating the tuples.. so it works but its hacky and honestly probably not great.
Doing the above also complicates using the library efficiently as you may have an index covering [:id_1, :id_2, :id_3] but if cursor_fields: [:id_3, :id_1, :id_2] is provided, no index will be used by postgres :(
Anyhow, totally solvable perhaps in a more intelligent way than we did it.
The only reason we did this is because I'm pretty sure (id_1, id_2, id_3) > (1, 2, 3) is semantically different from: id_1 > 1 AND id_2 > 2 AND id_3 > 3 but maybe I'm wrong? If these are identical then maybe our hack is unnecessary here.
For the second point though... this is much more problematic. Say I want to stream over:
from x in MySchema,
select: %{id: nil}
This library (and our original implementation) will fail to stream over these records as the specified cursor fields aren't in the results returned from Ecto.Repo.all/2.
We fixed this by dynamically injecting a select_merge: %{__cursor__: {x.id_1, x.id_2, x.id_3}} but again, we had to resort to a lot of hacking in order to actually get this working (building fragment ASTs by hand and manually patching Ecto queries...)
Would love to know if you have any thoughts on the above!! Maybe there exists better/smarter ways to do what we're doing, and maybe there's an opportunity to collaborate on implementing these functions in EctoCursorBasedStream?
Hi @vereis, thanks for you detailed issue. I have released new version 1.2.0 that supports cursor on multiple columns, please give it a try. Regarding arbitrary selects, I find current implementation to be flexible enough to select what you need, as long as you select cursor fields as a map. Do you any examples that aren't possible?
I'll give this a try in the next few days! Thank you :-)
I'll let you know how it goes ❤️