activerecord_where_assoc icon indicating copy to clipboard operation
activerecord_where_assoc copied to clipboard

General feedback

Open MaxLap opened this issue 7 years ago • 15 comments

If you have any feedback (good or bad) about the gem that you don't feel is worthy of an issue, please post a comment here.

I'd be really interested in why you decided not to use this gem.

MaxLap avatar Apr 10 '18 11:04 MaxLap

Looks like a good approach! Did you try showing this to anyone from the Rails team? I don't think there's a huge chance they will adopt this but it's worth hearing their feedback.

yoelblum avatar Apr 27 '18 08:04 yoelblum

@yoelblum No i didn't show it to the Rails team.

In 2015, someone did a PR for similar feature (not as fleshed out in the details) and it got refused just based on somone being against the feature. I don't know if their mood changed related to this.

https://github.com/rails/rails/pull/21438

The guy ended up making a gem with it, and we didn't hear much about it after that sadly.

https://github.com/EugZol/where_exists

MaxLap avatar Apr 27 '18 12:04 MaxLap

I'm doing a lot of filtering logic in my apps, and I'd often run into bugs when combining several scopes which do joins. I've just found this gem yesterday, and it's really helpful.

Is this likely to survive rails upgrades? I'm a bit wary with gems that extend activerecord, because they tend to have a hard time going through major rails upgrades.

fschwahn avatar Sep 18 '18 10:09 fschwahn

Glad you find it helpful!

This is essentially the same codebase that deals with ActiveRecord 4.1 all the way to ActiveRecord 5.2. I'm not doing some deep hacks of changing how ActiveRecord behaves. All that the gem does is generate a single where. Everything else is about finding the data in the associations, so I would say very likely to go through major Rails upgrades.

The CI build on Travis even has a job to run against ActiveRecord's master branch, so I can prepare in advance if something starts breaking from the ongoing development.

Thanks for the feedback!

MaxLap avatar Sep 18 '18 11:09 MaxLap

That sounds good!

Do you have any data on how this performs compared to joins? Especially with > 1 million rows? Is it faster? Is it slower? Is it comparable? If you have any data on this, it might make a good addition to the README.

fschwahn avatar Sep 18 '18 12:09 fschwahn

SQL-wise, it should normally be either comparable or faster to use EXISTS. I don't personally have data on this, but you can find comparisons online of SQL JOIN vs EXISTS. Here is an example saying around 9% faster for EXISTS: https://danmartensen.svbtle.com/sql-performance-of-join-and-where-exists

However, when dealing with has_one, it is probable that this gem will be slower (can't really say by how much). This is because this gem will correctly filter based on only the "first" record of the association, but figuring out which record is that "first" is more work (Need to apply ordering). The JOIN way just doesn't care and will lead to unexpected results (Unless you expect it to treat as has_one like a has_many).

MaxLap avatar Sep 18 '18 15:09 MaxLap

Thanks for this gem! I've been writing things like this for years:

scope :with_foo, -> (foo) {
  where(<<~EOQ)
    EXISTS (SELECT 1
            FROM ...)
  EOQ
}

Having a gem to do it for me would be so much better!

Since you asked for general feedback, here are two things I notice:

Imagine you have 3 tables: users, user_roles, and roles. On User there is has_many :roles, through: :user_roles. (I think thousands of apps have this structure.) Now you want to get all the users with the foo role. This gem lets me say User.where_assoc_exists(:roles, name: 'foo'). That is awesome! It generates SQL like this (whitespace added for clarity):

SELECT "users".*
FROM   "users"
WHERE  (EXISTS (
  SELECT 1
  FROM   "user_roles"
  WHERE  "user_roles"."user_id" = "users"."id"
  AND    (EXISTS (
    SELECT 1
    FROM   "roles"
    WHERE  "roles"."id" = "user_roles"."role_id"
    AND    "roles"."name" = 'foo'))))

The 2 improvements are:

  • 'foo' should be a parameter like $1 instead, just as if I'd said User.where(first_name: 'bar').
  • user_roles and roles should be connected with a join (like when following the association), not with a nested EXISTS. These are equivalent in meaning (when used within the outer EXISTS), and a quick test on a (small, unfortunately) database shows that Postgres even produces the same query plan. But still it seems more compatible with Active Record to follow its approach. For example it might matter if you define custom methods on your association.

If these sound like improvements to you, I'd be willing to write up separate issues for them and look at sending you PRs. What do you think?

pjungwir avatar Jun 04 '20 16:06 pjungwir

@pjungwir Hey, thank you for the feedback! Glad you find the gem useful.

Having a gem to do it for me would be so much better!

"Would be" implies that you don't use the gem? Is the missing handling of ? a blocker to using it at all?

For your issue with the placeholders ?. I worry about the complexity of doing such a restructuring, but if you want to give it a try, by all means go for it.

Using JOIN creates more edge cases, since things remain in the same context. Off the top of my head, here are problems this bring:

  • If you have a recursive association, the joined table would then need a different name. And then this makes everything more complex when people pass in string conditions or use scopes, which name to use, etc.
  • You can nest where_assoc_* calls using blocks, those require their own exists. So going with joins would require still doing something similar to what is currently happening. It seems more mentally complex to have 2 ways of doing things, both for developers and users of the gem.

And I don't see any benefit to using JOIN. As you say, the query plan is the same.

MaxLap avatar Jun 04 '20 16:06 MaxLap

I'm not using it yet. I just discovered it. I don't think either of the issues are enough to stop me from using it.

Thanks for giving your thoughts about both those suggestions! To me, using a join feels like the more important of the two, although I don't have any good reasons to back that up.

I think at least for Postgres you don't need to add table aliases. For example you can do this (not that it means anything):

SELECT users.*
FROM   users
JOIN   user_roles
ON     user_roles.user_id = users.id
WHERE  user_roles.role_id = 5
AND   EXISTS (
  SELECT 1
  FROM   user_roles
  INNER JOIN roles
  ON     roles.id = user_roles.role_id
  WHERE  user_roles.user_id = users.id
  AND    roles.name = 'foo'))

Personally I think it is more mentally complex to have two ways a through association can behave: usually with a JOIN but here with an EXISTS. But I guess that's subjective.

Anyway, maybe I'll look into using parameters at least.

pjungwir avatar Jun 04 '20 17:06 pjungwir

I see what you mean about not needing alias. But that only works if the only "reused" name is the last one. Imagine a weird relation that goes users -> foo -> users -> bar. Your chain of JOIN will include users which was already there, and then you need an alias, or to nest in an EXISTS. And if you now must sometimes do it one way or another way, it doesn't feel like much would be gained.

Also, doing a JOIN could mean returning duplicated results if it is not inside of the EXISTS. Your query won't, because there is a WHERE user_roles.role_id = 5, but as you said, that query doesn't really mean anything.

What do you refer to when you say "usually with a join". When does a :through get used with a JOIN? The only times I can think of is when when you call joins, when you eager_load, and when you do includes.references. Those are all about getting data from the database. (Or, when you want to do conditions on associations in what I consider to be a less desirable way)

When data from another table must be returned, then JOIN is the tool for the job.
When filtering based on another table, the cleanest and safest way is using EXISTS. It's true, however, that in many cases, JOIN will be able to do the job, but its more error prone.

These are different operations, it makes sense that they use different operators.

Here are 3 relations which do the same thing. I like that they generate the same SQL query.

User.where_assoc_exists(:roles, name: 'foo')
User.where_assoc_exists(:user_roles) {|ur| ur.where_assoc_exists(:role, name: 'foo') }
User.where_assoc_exists([:user_roles, :role], name: 'foo')

I like this discussion, if you'd like to continue it, I would prefer if you could make a separate Gitub issue for it, it is out of the scope of general feedback. You can quote this post and reply to it there.

Thank you for your interest!

MaxLap avatar Jun 04 '20 18:06 MaxLap

Is there a way to check if the association was already added? To avoid double-ups. For instance:

user = User.where_assoc_exists(:roles)
# passed to another method that also adds the assoc
user = user.where_assoc_exists(:roles)

Would produce a query that joins on roles twice. I would like to check if it's already added, so I can avoid adding it again.

ridiculous avatar Aug 04 '22 19:08 ridiculous

Good question.

While not as clean as normal usage. I guess you could use where_assoc_exists_sql, which returns the SQL for the condition. (where_assoc_exists is just a wrapper for that to add a where(sql))

Having that SQL string, you could then check in the relation's where_values (sorry I don't remember the exact name of that ActiveRecord-internal method) if that SQL is already in there. If not, then you do a where(sql).

Good luck

MaxLap avatar Aug 04 '22 20:08 MaxLap

Thanks for the gem, I am curious if there is a way to use a block with a nested where_assoc_count and add conditions to both levels of nesting.

Using the example here Post.where_assoc_count(10, :<=, [:comments, :replies])(from: https://github.com/MaxLap/activerecord_where_assoc/blob/04d533c46c185563bdaf802f11e4ae7b2e733167/lib/active_record_where_assoc/relation_returning_methods.rb#L343 )

I seem to be able to get it to work with something like

Post.where_assoc_count(10, :<=, [:comments, :replies], ['comments.created_at > ? AND replies.created_at < ?'])

Can't figure out if there is a way to access both levels of nesting within a block. If I do

Post.where_assoc_count(10, :<=, [:comments, :replies]) do |x|

x always seems to be the deepest scope (so replies). Is there a way to access the comments relation from within that block?

jonny5 avatar Nov 22 '22 03:11 jonny5

Hello @jonny5, sadly, there is no way to do this at the moment. I didn't think of that use-case when developping this gem. For the where_assoc_exists, you can call it again from inside the first block, but for counting, that wouldn't work.

As you found out, SQL's nesting can allow you to it when using strings, so that's a pretty good work-around.

Since where_assoc_count is a very niche use-case and since there is a workaround, I don't plan on adding such a capability for now.

MaxLap avatar Nov 22 '22 13:11 MaxLap

btw, some followup on my question above: I wasn't able to use where_values or the like to check if the query already had the assoc, and I didn't want to parse the generated SQL. So instead I changed how the code was calling it to reduce double-ups.

ridiculous avatar Nov 22 '22 18:11 ridiculous