Support value distribution histograms [CORE1686]
Submitted by: Timo Partanen (partim)
Is duplicated by CORE2381 Relate to CORE4666
Votes: 12
To provide better performance for queries, Firebird should support value distribution histograms. A histogram is a statistical report that shows the frequency of values within steps or ranges of values that fall between a certain minimum and maximum. Generally, histograms improve the performance of queries because the optimizer can use them to estimate the selectivity of conditions better than from the selectivity of indexes.
Modified by: @dyemanov
assignee: Dmitry Yemanov [ dimitr ]
Modified by: @pcisar
Workflow: jira [ 13813 ] => Firebird [ 14121 ]
Commented by: @dyemanov
Suggested by Bill Oliver:
When this is added I would like to see the histogram information available through the monitoring tables, too. Information that might be surfaced might include:
- Column's Most Common Value (MCV) - The frequency of the most common value, call this Most Common Frequency (MCF) - Histogram information
Modified by: @dyemanov
Fix Version: 3.0 Alpha 1 [ 10331 ]
Commented by: Philip Williams (unordained)
Is NULL handled as a special case in the histogram?
Can other values be requested to be treated as special, for situations where you know a field is being abused with special values (e.g. the field has a very regular distribution, except for the values -1, 0, and 1, which are each individually special and very different from > 1.)
Commented by: @krilbe
In ref to Philip Williams' comment: I vote strongly for treating null as a special case in the histogram, i.e. always keep a specific frequency/selectivity figure for null, if the indexed column allows null. For a compound index, I'm not quite sure how to handle cases where some columns are null and some are not, but i assume that the all-null case should be treated separately, i.e. like null for a single-column index.
Commented by: @dyemanov
Yes, nulls are to be treated separately.
Modified by: @dyemanov
Fix Version: 3.0 Beta 1 [ 10332 ]
Fix Version: 3.0 Alpha 1 [ 10331 ] =>
Modified by: @dyemanov
Fix Version: 3.0 Beta 2 [ 10586 ]
Fix Version: 3.0 Beta 1 [ 10332 ] =>
Modified by: @dyemanov
Fix Version: 3.0 Beta 2 [ 10586 ] =>
Commented by: Atri Sharma (atris)
Hi,
I am a relational database developer, focused on many areas.
I would like to start hacking Firebird,with this issue, if possible.
Please advice.
Commented by: Ann Harrison (awharrison)
Histograms are less useful in database like Firebird that optimize queries at compile time rather than run time. Knowing the value distribution in an index is of little use to the optimizer if it doesn't know what value will be searched. Not that this feature shouldn't be implemented, just don't expect it to improved performance dramatically in cases like this:
Select c.zipcode from customers c inner join orders o on o.customerId = c.customerId inner join orderItems oi on oi.orderId = o.orderId where oi.item = ? and c.state = ?
Modified by: @dyemanov
Fix Version: 4.0 Alpha 1 [ 10731 ]
Modified by: @dyemanov
Fix Version: 4.0 Beta 1 [ 10750 ]
Fix Version: 4.0 Alpha 1 [ 10731 ] =>
Modified by: @dyemanov
Fix Version: 4.0 Beta 1 [ 10750 ] =>
Databases I deal with includes both NULLs and EMPTY strings. It is not feasible to convert EMTPY strings to NULLs because the risk that applications start giving errors. I'm read in a recent discussion that histograms might be of help for EMPTY string values. There are more than several reports about "worse" performance of v3 or v4 versions compared to v2.5 or earlier. Since histograms are expected to solve some of such problems even not all, I'd like to see them added.