avram icon indicating copy to clipboard operation
avram copied to clipboard

Ensure order is set to avoid non deterministic queries

Open robcole opened this issue 2 years ago • 5 comments

In past projects, we’ve been bit occasionally by implicit code in Rails that queried collections and would USUALLY, but not always, return in the order of ID ASC.

Order is non deterministic in SQL unless specified, so I was thinking that bringing in some config (since we don’t want to break old apps) to optionally hard raise when order is missing might be a really nice feature.

It’s meant to address the same type of concerns that https://github.com/santib/chaotic_order addresses in Rails.

This is something I’d be happy to take a crack at if there’s appetite for this being part of core Avram. @matthewmcgarvey @jwoertink any thoughts?

robcole avatar Jan 03 '22 20:01 robcole

That's an interesting thought. I don't know about forcing a random chaotic order 😂 but the idea of queries requiring some sort of order is kind of interesting. First thing that comes to mind is, doesn't adding an order increase query time?

> select * from contests;
Time: 1.756 ms

> select * from contests order by id asc;
Time: 2.502 ms

This isn't a huge different, but I only have 3 records....

I guess if we were going to do this, we'd need to answer a few things.

  1. There must be an escape hatch for when you can't/don't want the order
  2. We need to consider when an order works, and when it doesn't (i.e. aggregate queries, or joined objects)

On the note of joined objects, consider this query: UserQuery.new.where_posts(PostQuery.new.active).username.asc_order. The PostQuery object doesn't include the order call... should it? Would this cause an issue, or be ok since the entire query itself technically contains an order?

  1. Can we catch these issues at compile-time? If I add an order, but it's not the right one, did I save any additional time than not adding an order, and adding it later?
  2. Does the impact of performance, and additional compiler fighting outweigh the benefit we get? Personally, I'm ok with a loss of performance if it means catching more bugs in development... but just another thing to consider as a whole framework.

I'm curious to see what others think on this. Thanks for the suggestion!

jwoertink avatar Jan 03 '22 21:01 jwoertink

There must be an escape hatch for when you can't/don't want the order

Ideally I think this would be a configuration option that is false by default, so as to not suddenly break older applications, but I’m not sure where you wouldn’t want an order at all. LMK if you’ve got an example. I can see that being true for counts and full collections, but I’m sure I’m missing some other examples.

I could easily see something like .unordered for those times tho.

Can we catch these issues at compile-time? If I add an order, but it's not the right one, did I save any additional time than not adding an order, and adding it later?

The idea we were toying with was making sure @orders isn’t blank before executing the query. I’m not sure how worthwhile this is if we can’t catch it at compile time, but open to thoughts f’sho.

We need to consider when an order works, and when it doesn't (i.e. aggregate queries, or joined objects)

I don’t think we need to consider when it won’t work, as an order will always work, and this is separate from checking the correctness of .order itself. But I haven’t taken any sort of deep dive into this so I’m reserving the possibility that I’m 💯 incorrect about that.

robcole avatar Jan 03 '22 21:01 robcole

Related: https://github.com/luckyframework/website/issues/548

jwoertink avatar Jan 03 '22 21:01 jwoertink

This isn't a huge different, but I only have 3 records....

I’ll run some tests but as long as the order is on an indexed column, I think I they should be very very similar performance. Seems like something we could sample 500 times on a decent sized collection and get some sampling on and see from there.

robcole avatar Jan 03 '22 21:01 robcole

UserQuery.new.where_posts(PostQuery.new.active).username.asc_order. The PostQuery object doesn't include the order call... should it? Would this cause an issue, or be ok since the entire query itself technically contains an order?

No, post query is a filtering query, as part of UserQuery which is the return we care about

I think this can very likely be a compile time error, we would just need to figure out a pattern that is fitting

Lastly, if performance is a concern, we could have some sort of .unordered that sets a flag to ignore this.

I could see this working a little bit like the n+1/preloading stuff in terms of configuration

grepsedawk avatar Jan 04 '22 01:01 grepsedawk