firebird icon indicating copy to clipboard operation
firebird copied to clipboard

Support value distribution histograms [CORE1686]

Open firebird-automations opened this issue 17 years ago • 18 comments

Submitted by: Timo Partanen (partim)

Is duplicated by CORE2381 Relate to CORE4666

Votes: 12

To provide better performance for queries, Firebird should support value distribution histograms. A histogram is a statistical report that shows the frequency of values within steps or ranges of values that fall between a certain minimum and maximum. Generally, histograms improve the performance of queries because the optimizer can use them to estimate the selectivity of conditions better than from the selectivity of indexes.

firebird-automations avatar Jan 12 '08 06:01 firebird-automations

Modified by: @dyemanov

assignee: Dmitry Yemanov [ dimitr ]

firebird-automations avatar Jan 12 '08 07:01 firebird-automations

Modified by: @pcisar

Workflow: jira [ 13813 ] => Firebird [ 14121 ]

firebird-automations avatar Jan 28 '08 15:01 firebird-automations

Commented by: @dyemanov

Suggested by Bill Oliver:

When this is added I would like to see the histogram information available through the monitoring tables, too. Information that might be surfaced might include:

- Column's Most Common Value (MCV) - The frequency of the most common value, call this Most Common Frequency (MCF) - Histogram information

firebird-automations avatar Jan 14 '10 12:01 firebird-automations

Modified by: @dyemanov

Link: This issue is duplicated by CORE2381 [ CORE2381 ]

firebird-automations avatar Jan 14 '10 12:01 firebird-automations

Modified by: @dyemanov

Fix Version: 3.0 Alpha 1 [ 10331 ]

firebird-automations avatar Mar 16 '11 08:03 firebird-automations

Commented by: Philip Williams (unordained)

Is NULL handled as a special case in the histogram?

Can other values be requested to be treated as special, for situations where you know a field is being abused with special values (e.g. the field has a very regular distribution, except for the values -1, 0, and 1, which are each individually special and very different from > 1.)

firebird-automations avatar May 06 '11 21:05 firebird-automations

Commented by: @krilbe

In ref to Philip Williams' comment: I vote strongly for treating null as a special case in the histogram, i.e. always keep a specific frequency/selectivity figure for null, if the indexed column allows null. For a compound index, I'm not quite sure how to handle cases where some columns are null and some are not, but i assume that the all-null case should be treated separately, i.e. like null for a single-column index.

firebird-automations avatar Aug 14 '12 11:08 firebird-automations

Commented by: @dyemanov

Yes, nulls are to be treated separately.

firebird-automations avatar Aug 14 '12 11:08 firebird-automations

Modified by: @dyemanov

Fix Version: 3.0 Beta 1 [ 10332 ]

Fix Version: 3.0 Alpha 1 [ 10331 ] =>

firebird-automations avatar Oct 23 '12 10:10 firebird-automations

Modified by: @dyemanov

Fix Version: 3.0 Beta 2 [ 10586 ]

Fix Version: 3.0 Beta 1 [ 10332 ] =>

firebird-automations avatar Apr 24 '14 15:04 firebird-automations

Modified by: @dyemanov

Fix Version: 3.0 Beta 2 [ 10586 ] =>

firebird-automations avatar Aug 17 '14 12:08 firebird-automations

Modified by: Sean Leyne (seanleyne)

Link: This issue relate to CORE4666 [ CORE4666 ]

firebird-automations avatar Jan 16 '15 18:01 firebird-automations

Commented by: Atri Sharma (atris)

Hi,

I am a relational database developer, focused on many areas.

I would like to start hacking Firebird,with this issue, if possible.

Please advice.

firebird-automations avatar Mar 01 '16 21:03 firebird-automations

Commented by: Ann Harrison (awharrison)

Histograms are less useful in database like Firebird that optimize queries at compile time rather than run time. Knowing the value distribution in an index is of little use to the optimizer if it doesn't know what value will be searched. Not that this feature shouldn't be implemented, just don't expect it to improved performance dramatically in cases like this:

Select c.zipcode from customers c inner join orders o on o.customerId = c.customerId inner join orderItems oi on oi.orderId = o.orderId where oi.item = ? and c.state = ?

firebird-automations avatar Mar 02 '16 00:03 firebird-automations

Modified by: @dyemanov

Fix Version: 4.0 Alpha 1 [ 10731 ]

firebird-automations avatar Mar 09 '16 14:03 firebird-automations

Modified by: @dyemanov

Fix Version: 4.0 Beta 1 [ 10750 ]

Fix Version: 4.0 Alpha 1 [ 10731 ] =>

firebird-automations avatar Sep 01 '16 11:09 firebird-automations

Modified by: @dyemanov

Fix Version: 4.0 Beta 1 [ 10750 ] =>

firebird-automations avatar Jan 22 '19 19:01 firebird-automations

Databases I deal with includes both NULLs and EMPTY strings. It is not feasible to convert EMTPY strings to NULLs because the risk that applications start giving errors. I'm read in a recent discussion that histograms might be of help for EMPTY string values. There are more than several reports about "worse" performance of v3 or v4 versions compared to v2.5 or earlier. Since histograms are expected to solve some of such problems even not all, I'd like to see them added.

ertankucukoglu avatar Aug 25 '22 09:08 ertankucukoglu