foreman icon indicating copy to clipboard operation
foreman copied to clipboard

Fixes #35689 - Use better fact search query

Open adamruzicka opened this issue 2 years ago • 5 comments

adamruzicka avatar Jun 06 '23 14:06 adamruzicka

Issues: #35689

theforeman-bot avatar Jun 06 '23 14:06 theforeman-bot

Can you use Benchmark.measure to determine which implementation is faster? It would also be interesting to run "EXPLAIN ANALYZE SELECT ....` on the DB to see which one is better.

sbernhard avatar Dec 08 '23 17:12 sbernhard

I don't have a box with lots of facts right now and it will take me a bit to get one so I can't really benchmark it, however here are the explains:

Before the changes

Foreman

> Host.search_for('facts.architecture=x86_64').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" WHERE "hosts"."type" = $1 AND ((hosts.id in (SELECT "fact_values"."host_id" FROM "fact_values" INNER JOIN "fact_names" ON "fact_names"."id" = "fact_values"."fact_name_id" WHERE (fact_names.name = 'architecture') AND (fact_values.value = 'x86_64')))) ORDER BY "hosts"."name" ASC [["type", "Host::Managed"]]
                                                               QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=27266.33..27266.50 rows=67 width=3199)
   Sort Key: hosts.name
   ->  Nested Loop  (cost=27236.52..27264.30 rows=67 width=3199)
         ->  HashAggregate  (cost=27236.23..27236.90 rows=67 width=4)
               Group Key: fact_values.host_id
               ->  Nested Loop  (cost=231.73..27236.06 rows=67 width=4)
                     ->  Index Scan using index_fact_names_on_name_and_type on fact_names  (cost=0.28..8.29 rows=1 width=4)
                           Index Cond: ((name)::text = 'architecture'::text)
                     ->  Bitmap Heap Scan on fact_values  (cost=231.45..27226.78 rows=99 width=8)
                           Recheck Cond: (fact_name_id = fact_names.id)
                           Filter: (value = 'x86_64'::text)
                           ->  Bitmap Index Scan on index_fact_values_on_fact_name_id_and_host_id  (cost=0.00..231.43 rows=9999 width=0)
                                 Index Cond: (fact_name_id = fact_names.id)
         ->  Index Scan using hosts_pkey on hosts  (cost=0.29..0.41 rows=1 width=3199)
               Index Cond: (id = fact_values.host_id)
               Filter: ((type)::text = 'Host::Managed'::text)
(16 rows)

Discovery

> Host::Discovered.search_for('facts.architecture=x86_64').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" INNER JOIN "fact_values" fact_values_634325 ON (hosts.id = fact_values_634325.host_id)
 INNER JOIN "fact_names" fact_names_634325 ON (fact_names_634325.id =  fact_values_634325.fact_name_id) WHERE "hosts"."type" = $1 AND ((fact_names_634325."name" = 'architecture') AND ("hosts"."id" IN (SELECT "host_id" FROM "fact_values" WHERE "fact_values_634325"."value" = 'x86_64' ))) ORDER BY "hosts"."created_at" DESC NULLS LAST  [["type", "Host::Discovered"]]
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=27336.40..27336.40 rows=1 width=3199)
   Sort Key: hosts.created_at DESC NULLS LAST
   ->  Hash Join  (cost=240.27..27336.39 rows=1 width=3199)
         Hash Cond: (fact_values_634325.host_id = hosts.id)
         Join Filter: (SubPlan 1)
         ->  Nested Loop  (cost=234.20..27312.54 rows=6774 width=24)
               ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_634325  (cost=0.28..8.29 rows=1 width=4)
                     Index Cond: ((name)::text = 'architecture'::text)
               ->  Bitmap Heap Scan on fact_values fact_values_634325  (cost=233.92..27204.26 rows=9999 width=28)
                     Recheck Cond: (fact_name_id = fact_names_634325.id)
                     ->  Bitmap Index Scan on index_fact_values_on_fact_name_id_and_host_id  (cost=0.00..231.43 rows=9999 width=0)
                           Index Cond: (fact_name_id = fact_names_634325.id)
         ->  Hash  (cost=6.05..6.05 rows=1 width=3199)
               ->  Index Scan using index_hosts_on_type_and_location_id on hosts  (cost=0.29..6.05 rows=1 width=3199)
                     Index Cond: ((type)::text = 'Host::Discovered'::text)
         SubPlan 1
           ->  Result  (cost=0.00..150863.99 rows=8358799 width=4)
                 One-Time Filter: (fact_values_634325.value = 'x86_64'::text)
                 ->  Seq Scan on fact_values  (cost=0.00..150863.99 rows=8358799 width=4)
(19 rows)

After the changes

Foreman

> Host.search_for('facts.architecture=x86_64').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" LEFT JOIN fact_values AS fact_values_128255 ON fact_values_128255.host_id = hosts.id LEFT JOIN fact_names AS fact_names_128255 ON fact_names_128255.id = fact_values_128255.fact_name_id WHERE "hosts"."type" = $1 AND ((fact_names_128255.name = 'architecture' AND (fact_values_128255.value = 'x86_64'))) ORDER BY "hosts"."name" ASC [["type", "Host::Managed"]]
                                                               QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=27259.29..27259.46 rows=67 width=3199)
   Sort Key: hosts.name
   ->  Nested Loop  (cost=232.01..27257.26 rows=67 width=3199)
         ->  Nested Loop  (cost=231.73..27236.06 rows=67 width=4)
               ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_128255  (cost=0.28..8.29 rows=1 width=4)
                     Index Cond: ((name)::text = 'architecture'::text)
               ->  Bitmap Heap Scan on fact_values fact_values_128255  (cost=231.45..27226.78 rows=99 width=8)
                     Recheck Cond: (fact_name_id = fact_names_128255.id)
                     Filter: (value = 'x86_64'::text)
                     ->  Bitmap Index Scan on index_fact_values_on_fact_name_id_and_host_id  (cost=0.00..231.43 rows=9999 width=0)
                           Index Cond: (fact_name_id = fact_names_128255.id)
         ->  Index Scan using hosts_pkey on hosts  (cost=0.29..0.32 rows=1 width=3199)
               Index Cond: (id = fact_values_128255.host_id)
               Filter: ((type)::text = 'Host::Managed'::text)
(14 rows)

Discovery

> Host::Discovered.search_for('facts.architecture=x86_64').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" LEFT JOIN fact_values AS fact_values_a84d6e ON fact_values_a84d6e.host_id = hosts.id LEFT JOIN fact_names AS fact_names_a84d6e ON fact_names_a84d6e.id = fact_values_a84d6e.fact_name_id WHERE "hosts"."type" = $1 AND ((fact_names_a84d6e.name = 'architecture' AND (fact_values_a84d6e.value = 'x86_64'))) ORDER BY "hosts"."created_at" DESC NULLS LAST  [["type", "Host::Discovered"]]
                                                                      QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=22.83..22.84 rows=1 width=3199)
   Sort Key: hosts.created_at DESC NULLS LAST
   ->  Nested Loop  (cost=1.00..22.82 rows=1 width=3199)
         ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_a84d6e  (cost=0.28..8.29 rows=1 width=4)
               Index Cond: ((name)::text = 'architecture'::text)
         ->  Nested Loop  (cost=0.72..14.52 rows=1 width=3203)
               ->  Index Scan using index_hosts_on_type_and_location_id on hosts  (cost=0.29..6.05 rows=1 width=3199)
                     Index Cond: ((type)::text = 'Host::Discovered'::text)
               ->  Index Scan using index_fact_values_on_fact_name_id_and_host_id on fact_values fact_values_a84d6e  (cost=0.43..8.45 rows=1 width=8)
                     Index Cond: ((fact_name_id = fact_names_a84d6e.id) AND (host_id = hosts.id))
                     Filter: (value = 'x86_64'::text)
(11 rows)

adamruzicka avatar Mar 19 '24 11:03 adamruzicka

With selective joins and searching on multiple facts

Before

Foreman

> ::Host.search_for('facts.architecture=x86_64 and facts.chassistype=Other').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" WHERE "hosts"."type" = $1 AND (((hosts.id in (SELECT "fact_values"."host_id" FROM "fact_values" INNER JOIN "fact_names" ON "fact_names"."id" = "fact_values"."fact_name_id" WHERE (fact_names.name = 'architecture') AND (fact_values.value = 'x86_64'))) AND (hosts.id in (SELECT "fact_values"."host_id" FROM "fact_values" INNER JOIN "fact_names" ON "fact_names"."id" = "fact_values"."fact_name_id" WHERE (fact_names.name = 'chassistype') AND (fact_values.value = 'Other'))))) ORDER BY "hosts"."name" ASC [["type", "Host::Managed"]]
                                                                  QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=27682.51..27682.52 rows=1 width=3199)
   Sort Key: hosts.name
   ->  Nested Loop Semi Join  (cost=27236.51..27682.50 rows=1 width=3199)
         ->  Nested Loop  (cost=27235.80..27246.48 rows=26 width=3203)
               ->  HashAggregate  (cost=27235.51..27235.77 rows=26 width=4)
                     Group Key: fact_values_1.host_id
                     ->  Nested Loop  (cost=231.71..27235.45 rows=26 width=4)
                           ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_1  (cost=0.28..8.29 rows=1 width=4)
                                 Index Cond: ((name)::text = 'chassistype'::text)
                           ->  Bitmap Heap Scan on fact_values fact_values_1  (cost=231.43..27226.76 rows=39 width=8)
                                 Recheck Cond: (fact_name_id = fact_names_1.id)
                                 Filter: (value = 'Other'::text)
                                 ->  Bitmap Index Scan on index_fact_values_on_fact_name_id_and_host_id  (cost=0.00..231.43 rows=9999 width=0)
                                       Index Cond: (fact_name_id = fact_names_1.id)
               ->  Index Scan using hosts_pkey on hosts  (cost=0.29..0.41 rows=1 width=3199)
                     Index Cond: (id = fact_values_1.host_id)
                     Filter: ((type)::text = 'Host::Managed'::text)
         ->  Nested Loop  (cost=0.71..16.76 rows=1 width=4)
               ->  Index Scan using index_fact_names_on_name_and_type on fact_names  (cost=0.28..8.29 rows=1 width=4)
                     Index Cond: ((name)::text = 'architecture'::text)
               ->  Index Scan using index_fact_values_on_fact_name_id_and_host_id on fact_values  (cost=0.43..8.45 rows=1 width=8)
                     Index Cond: ((fact_name_id = fact_names.id) AND (host_id = hosts.id))
                     Filter: (value = 'x86_64'::text)
(23 rows)

Discovery

> ::Host::Discovered.search_for('facts.architecture=x86_64 and facts.chassistype=Other').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" INNER JOIN "fact_values" fact_values_253030 ON (hosts.id = fact_values_253030.host_id)
 INNER JOIN "fact_names" fact_names_253030 ON (fact_names_253030.id =  fact_values_253030.fact_name_id) INNER JOIN "fact_values" fact_values_22845 ON (hosts.id = fact_values_22845.host_id)
 INNER JOIN "fact_names" fact_names_22845 ON (fact_names_22845.id =  fact_values_22845.fact_name_id) WHERE "hosts"."type" = $1 AND ((fact_names_253030."name" = 'architecture') AND (fact_names_22845."name" = 'chassistype') AND (("hosts"."id" IN (SELECT "host_id" FROM "fact_values" WHERE "fact_values_253030"."value"
= 'x86_64' )) AND ("hosts"."id" IN (SELECT "host_id" FROM "fact_values" WHERE "fact_values_22845"."value" = 'Other' )))) ORDER BY "hosts"."created_at" DESC NULLS LAST  [["type", "Host::Discovered"]]
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=54832.55..54832.56 rows=1 width=3199)
   Sort Key: hosts.created_at DESC NULLS LAST
   ->  Hash Join  (cost=27637.48..54832.54 rows=1 width=3199)
         Hash Cond: (fact_values_253030.host_id = hosts.id)
         Join Filter: ((SubPlan 1) AND (SubPlan 2))
         ->  Hash Join  (cost=27631.42..54812.77 rows=5221 width=48)
               Hash Cond: (fact_values_253030.host_id = fact_values_22845.host_id)
               ->  Nested Loop  (cost=234.20..27312.54 rows=6774 width=24)
                     ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_253030  (cost=0.28..8.29 rows=1 width=4)
                           Index Cond: ((name)::text = 'architecture'::text)
                     ->  Bitmap Heap Scan on fact_values fact_values_253030  (cost=233.92..27204.26 rows=9999 width=28)
                           Recheck Cond: (fact_name_id = fact_names_253030.id)
                           ->  Bitmap Index Scan on index_fact_values_on_fact_name_id_and_host_id  (cost=0.00..231.43 rows=9999 width=0)
                                 Index Cond: (fact_name_id = fact_names_253030.id)
               ->  Hash  (cost=27312.54..27312.54 rows=6774 width=24)
                     ->  Nested Loop  (cost=234.20..27312.54 rows=6774 width=24)
                           ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_22845  (cost=0.28..8.29 rows=1 width=4)
                                 Index Cond: ((name)::text = 'chassistype'::text)
                           ->  Bitmap Heap Scan on fact_values fact_values_22845  (cost=233.92..27204.26 rows=9999 width=28)
                                 Recheck Cond: (fact_name_id = fact_names_22845.id)
                                 ->  Bitmap Index Scan on index_fact_values_on_fact_name_id_and_host_id  (cost=0.00..231.43 rows=9999 width=0)
                                       Index Cond: (fact_name_id = fact_names_22845.id)
         ->  Hash  (cost=6.05..6.05 rows=1 width=3199)
               ->  Index Scan using index_hosts_on_type_and_location_id on hosts  (cost=0.29..6.05 rows=1 width=3199)
                     Index Cond: ((type)::text = 'Host::Discovered'::text)
         SubPlan 1
           ->  Result  (cost=0.00..150863.99 rows=8358799 width=4)
                 One-Time Filter: (fact_values_253030.value = 'x86_64'::text)
                 ->  Seq Scan on fact_values  (cost=0.00..150863.99 rows=8358799 width=4)
         SubPlan 2
           ->  Result  (cost=0.00..150863.99 rows=8358799 width=4)
                 One-Time Filter: (fact_values_22845.value = 'Other'::text)
                 ->  Seq Scan on fact_values fact_values_1  (cost=0.00..150863.99 rows=8358799 width=4)
(33 rows)

After

Foreman

> ::Host.search_for('facts.architecture=x86_64 and facts.chassistype=Other').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" LEFT JOIN fact_values AS fact_values_55e851 ON fact_values_55e851.host_id = hosts.id LEFT JOIN fact_names AS fact_names_55e851 ON fact_names_55e851.id = fact_values_55e851.fact_name_id AND fact_names_55e851.name = 'architecture' LEFT JOIN fact_values AS fact_values_76baee ON fact_values_76baee.host_id = hosts.id LEFT JOIN fact_names AS fact_names_76baee ON fact_names_76baee.id = fact_values_76baee.fact_name_id AND fact_names_76baee.name = 'chassistype' WHERE "hosts"."type" = $1 AND (((fact_names_55e851.name = 'architecture' AND (fact_values_55e851.value = 'x86_64')) AND (fact_names_76baee.name = 'chassistype' AND (fact_values_76baee.value = 'Other')))) ORDER BY "hosts"."name" ASC [["type", "Host::Managed"]]
                                                                         QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=27465.15..27465.16 rows=1 width=3199)
   Sort Key: hosts.name
   ->  Nested Loop  (cost=232.71..27465.14 rows=1 width=3199)
         ->  Nested Loop  (cost=232.42..27464.82 rows=1 width=8)
               ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_55e851  (cost=0.28..8.29 rows=1 width=4)
                     Index Cond: ((name)::text = 'architecture'::text)
               ->  Nested Loop  (cost=232.14..27455.54 rows=99 width=12)
                     ->  Nested Loop  (cost=231.71..27235.45 rows=26 width=4)
                           ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_76baee  (cost=0.28..8.29 rows=1 width=4)
                                 Index Cond: ((name)::text = 'chassistype'::text)
                           ->  Bitmap Heap Scan on fact_values fact_values_76baee  (cost=231.43..27226.76 rows=39 width=8)
                                 Recheck Cond: (fact_name_id = fact_names_76baee.id)
                                 Filter: (value = 'Other'::text)
                                 ->  Bitmap Index Scan on index_fact_values_on_fact_name_id_and_host_id  (cost=0.00..231.43 rows=9999 width=0)
                                       Index Cond: (fact_name_id = fact_names_76baee.id)
                     ->  Index Scan using index_fact_values_on_fact_name_id_and_host_id on fact_values fact_values_55e851  (cost=0.43..8.45 rows=1 width=8)
                           Index Cond: ((fact_name_id = fact_names_55e851.id) AND (host_id = fact_values_76baee.host_id))
                           Filter: (value = 'x86_64'::text)
         ->  Index Scan using hosts_pkey on hosts  (cost=0.29..0.32 rows=1 width=3199)
               Index Cond: (id = fact_values_55e851.host_id)
               Filter: ((type)::text = 'Host::Managed'::text)
(21 rows)

Discovery

> ::Host::Discovered.search_for('facts.architecture=x86_64 and facts.chassistype=Other').explain
=>
EXPLAIN for: SELECT "hosts".* FROM "hosts" LEFT JOIN fact_values AS fact_values_9dcff5 ON fact_values_9dcff5.host_id = hosts.id LEFT JOIN fact_names AS fact_names_9dcff5 ON fact_names_9dcff5.id = fact_values_9dcff5.fact_name_id AND fact_names_9dcff5.name = 'architecture' LEFT JOIN fact_values AS fact_values_47b143 ON fact_values_47b143.host_id = hosts.id LEFT JOIN fact_names AS fact_names_47b143 ON fact_names_47b143.id = fact_values_47b143.fact_name_id AND fact_names_47b143.name = 'chassistype' WHERE "hosts"."type" = $1 AND (((fact_names_9dcff5.name = 'architecture' AND (fact_values_9dcff5.value = 'x86_64')) AND (fact_names_47b143.name = 'chassistype' AND (fact_values_47b143.value = 'Other')))) ORDER BY "hosts"."created_at" DESC NULLS LAST  [["type", "Host::Discovered"]]
                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=39.60..39.61 rows=1 width=3199)
   Sort Key: hosts.created_at DESC NULLS LAST
   ->  Nested Loop  (cost=1.71..39.59 rows=1 width=3199)
         ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_47b143  (cost=0.28..8.29 rows=1 width=4)
               Index Cond: ((name)::text = 'chassistype'::text)
         ->  Nested Loop  (cost=1.43..31.29 rows=1 width=3203)
               ->  Nested Loop  (cost=1.00..22.82 rows=1 width=3203)
                     ->  Index Scan using index_fact_names_on_name_and_type on fact_names fact_names_9dcff5  (cost=0.28..8.29 rows=1 width=4)
                           Index Cond: ((name)::text = 'architecture'::text)
                     ->  Nested Loop  (cost=0.72..14.52 rows=1 width=3207)
                           ->  Index Scan using index_hosts_on_type_and_location_id on hosts  (cost=0.29..6.05 rows=1 width=3199)
                                 Index Cond: ((type)::text = 'Host::Discovered'::text)
                           ->  Index Scan using index_fact_values_on_fact_name_id_and_host_id on fact_values fact_values_9dcff5  (cost=0.43..8.45 rows=1 width=8)
                                 Index Cond: ((fact_name_id = fact_names_9dcff5.id) AND (host_id = hosts.id))
                                 Filter: (value = 'x86_64'::text)
               ->  Index Scan using index_fact_values_on_fact_name_id_and_host_id on fact_values fact_values_47b143  (cost=0.43..8.45 rows=1 width=8)
                     Index Cond: ((fact_name_id = fact_names_47b143.id) AND (host_id = fact_values_9dcff5.host_id))
                     Filter: (value = 'Other'::text)
(18 rows)

adamruzicka avatar Mar 19 '24 11:03 adamruzicka