datapusher-plus
datapusher-plus copied to clipboard
Low cardinality auto indexing
Describe the bug
Auto index generation on cardinality is backwards.
https://github.com/dathere/datapusher-plus/blob/master/datapusher/jobs.py#L1728
# if a column's cardinality <= AUTO_INDEX_THRESHOLD, create an index for that column
Low cardinality indexes on postgresql aren't useful, because the planner will almost never choose them.
The planner will choose a table scan instead of an index scan if it is likely that many/most pages will be read. When you have a low cardinality index, eg a bool, the planner is going to assume that 1/2 of rows will be hit, and therefore essentially all pages. Even if you've got a distribution that's really skewed, the planner will generally not choose that index. (if you do have a known skewed distribution, you can do a partial/multicolumn index where the condition is the rare case, but that's not really applicable for a general purpose tool. )
This should be reversed, and set the default threshold to ~10 at least.