acts-as-taggable-on icon indicating copy to clipboard operation
acts-as-taggable-on copied to clipboard

Use exists clause over IN Tag.for_context, Tag.for_tenant to improve performance

Open aovertus opened this issue 3 years ago • 0 comments

Looking at documentation and articles it seem that IN is only beneficial on really small dataset.

IN:

Returns true if a specified value matches any value in a subquery or a list.

Exists:

Returns true if a subquery contains any rows.

If anyone as deeper understanding of IN over EXISTS. It looks like more recent engine handle those query the same way.

The same strategy exists for https://github.com/mbleigh/acts-as-taggable-on/blob/master/lib/acts_as_taggable_on/taggable/tagged_with_query/any_tags_query.rb#L20-L22

Here is the example we ran into production which generated performance issues

Engine:

MySQL 5.7

Dataset

~ 7_000_000 taggins records ~ 1300 tags

Current implementation

3.0.2 :003 > ActsAsTaggableOn::Tag.for_context('context').to_sql
=> "SELECT DISTINCT tags.* FROM `tags` INNER JOIN `taggings` ON `taggings`.`tag_id` = `tags`.`id` WHERE (taggings.context = 'context')"

Query run in ~ 58.74s

Patch

3.0.2 :004 > ActsAsTaggableOn::Tag.where(ActsAsTaggableOn::Tagging.where(context: 'context').arel.exists).to_sql
# => "SELECT `tags`.* FROM `tags` WHERE EXISTS (SELECT `taggings`.* FROM `taggings` WHERE `taggings`.`context` = 'context')"

Query run in ~ 149ms

aovertus avatar Jun 16 '22 22:06 aovertus