acts-as-taggable-on
acts-as-taggable-on copied to clipboard
Use exists clause over IN Tag.for_context, Tag.for_tenant to improve performance
Looking at documentation and articles it seem that IN is only beneficial on really small dataset.
IN:
Returns true if a specified value matches any value in a subquery or a list.
Exists:
Returns true if a subquery contains any rows.
If anyone as deeper understanding of IN over EXISTS. It looks like more recent engine handle those query the same way.
The same strategy exists for https://github.com/mbleigh/acts-as-taggable-on/blob/master/lib/acts_as_taggable_on/taggable/tagged_with_query/any_tags_query.rb#L20-L22
Here is the example we ran into production which generated performance issues
Engine:
MySQL 5.7
Dataset
~ 7_000_000 taggins records ~ 1300 tags
Current implementation
3.0.2 :003 > ActsAsTaggableOn::Tag.for_context('context').to_sql
=> "SELECT DISTINCT tags.* FROM `tags` INNER JOIN `taggings` ON `taggings`.`tag_id` = `tags`.`id` WHERE (taggings.context = 'context')"
Query run in ~ 58.74s
Patch
3.0.2 :004 > ActsAsTaggableOn::Tag.where(ActsAsTaggableOn::Tagging.where(context: 'context').arel.exists).to_sql
# => "SELECT `tags`.* FROM `tags` WHERE EXISTS (SELECT `taggings`.* FROM `taggings` WHERE `taggings`.`context` = 'context')"
Query run in ~ 149ms