database_consistency icon indicating copy to clipboard operation
database_consistency copied to clipboard

citext false positive

Open pirj opened this issue 3 years ago • 1 comments

Hey @djezzzl ! :wave:

I have the following in db/schema.rb:

  enable_extension "citext"

  create_table "groups", force: :cascade do |t|
    t.citext "name", null: false
    t.index ["name"], name: "index_groups_on_name", unique: true

Model:

validates :name, uniqueness: { case_sensitive: false }

database_consistency complains:

MissingUniqueIndexChecker fail Group lower(name) model should have proper unique index in the database

psql mydb:

# \d groups
                                              Table "public.groups"
       Column       |              Type              | Collation | Nullable |                     Default
--------------------+--------------------------------+-----------+----------+-------------------------------------------------
 name               | citext                         |           | not null |
...
Indexes:
    "index_groups_on_name" UNIQUE, btree (name)

PostgreSQL citext extension docs: https://www.postgresql.org/docs/current/citext.html

I've debugged it, and:

sorted_uniqueness_validator_columns # => ["lower(name)"]
Helper.extract_index_columns(index.columns).sort # => ["name"]

:boom:

Does it make sense to remove/ignore lower( from the index column if this column's type is citext?

database_consistency version 1.1.2

pirj avatar Oct 08 '21 11:10 pirj

Hi @pirj ,

It makes total sense for me 👍 Would you mind to contribute?

djezzzl avatar Oct 13 '21 15:10 djezzzl

Hi @pirj ,

I have finally a PR for this (https://github.com/djezzzl/database_consistency/pull/147) to be fixed but I tested the planner and I think we still would better have a proper (lower) index instead.

The reason is that ActiveRecord doesn't recognize citext and apply lower anyway:

Entity.new(email: 'TMPTMP').valid?
 Entity Exists? (0.3ms)  SELECT 1 AS one FROM "entities" WHERE LOWER("entities"."email") = LOWER($1) LIMIT $2  [["email", "TMPTMP"], ["LIMIT", 1]]
database_consistency_test=# \d tmp;
                Table "public.tmp"
 Column |  Type  | Collation | Nullable | Default 
--------+--------+-----------+----------+---------
 email  | citext |           |          | 
Indexes:
    "tmp_idx" btree (email)

database_consistency_test=# explain analyze select * from tmp where lower(email) = lower('asdfasdf');
                                             QUERY PLAN                                             
----------------------------------------------------------------------------------------------------
 Seq Scan on tmp  (cost=0.00..2040.00 rows=500 width=9) (actual time=26.415..26.415 rows=0 loops=1)
   Filter: (lower((email)::text) = 'asdfasdf'::text)
   Rows Removed by Filter: 100000
 Planning Time: 0.296 ms
 Execution Time: 26.429 ms
(5 rows)

database_consistency_test=# explain analyze select * from tmp where email = 'asdfasdf';
                                                    QUERY PLAN                                                    
------------------------------------------------------------------------------------------------------------------
 Index Only Scan using tmp_idx on tmp  (cost=0.42..4.44 rows=1 width=9) (actual time=0.044..0.045 rows=0 loops=1)
   Index Cond: (email = 'asdfasdf'::citext)
   Heap Fetches: 0
 Planning Time: 0.047 ms
 Execution Time: 0.057 ms
(5 rows)

And after adding a lower index:

database_consistency_test=# \d tmp;
                Table "public.tmp"
 Column |  Type  | Collation | Nullable | Default 
--------+--------+-----------+----------+---------
 email  | citext |           |          | 
Indexes:
    "tmp_idx" btree (lower(email::text))

database_consistency_test=# explain analyze select * from tmp where email = 'asdfasdf';
                                            QUERY PLAN                                            
--------------------------------------------------------------------------------------------------
 Seq Scan on tmp  (cost=0.00..1790.00 rows=1 width=9) (actual time=55.766..55.767 rows=0 loops=1)
   Filter: (email = 'asdfasdf'::citext)
   Rows Removed by Filter: 100000
 Planning Time: 0.134 ms
 Execution Time: 55.777 ms
(5 rows)

database_consistency_test=# explain analyze select * from tmp where lower(email) = lower('asdfasdf');
                                                 QUERY PLAN                                                  
-------------------------------------------------------------------------------------------------------------
 Index Scan using tmp_idx on tmp  (cost=0.42..8.44 rows=1 width=9) (actual time=0.040..0.041 rows=0 loops=1)
   Index Cond: (lower((email)::text) = 'asdfasdf'::text)
 Planning Time: 0.086 ms
 Execution Time: 0.054 ms
(4 rows)

With that said, I will abandon the PR and close the issue.

Please feel free to reopen it.

djezzzl avatar Nov 16 '22 06:11 djezzzl

Hey @djezzzl 👋

Thanks for giving it a shot. There's this deficiency with a redundant lower that needs to be fixed on the Rails side. At this stage I don't see how database_consistency could handle it.

pirj avatar Nov 16 '22 07:11 pirj

https://github.com/rails/rails/pull/46568 one done, one to go

pirj avatar Nov 25 '22 14:11 pirj

Wow! Well done! Unfortunately, my past experience contributing to ROR wasn't great so I didn't expect it's actually possible 😀

djezzzl avatar Nov 26 '22 08:11 djezzzl

https://github.com/rails/rails/pull/46592

pirj avatar Nov 30 '22 07:11 pirj

Wow, you do the magic! Thank you for improving Rails experience for all of us!

djezzzl avatar Nov 30 '22 11:11 djezzzl