firebird Changed plan (possible regression) in FB 5.x / 6.x comparing to FB 3.x / 4.x

Problem was reported by one of our customer privately. Consider script from attached .zip ddl-and-run_-_for-empty-tables.sql.zip

Query that caused problems in customer environment has following plans:

for FB 3.x and 4.x:

Select Expression
    -> Aggregate
        -> Filter
            -> Nested Loop Join (outer)
                -> Nested Loop Join (outer)
                    -> Filter
                        -> Table "PAYMENTSDETAILS" as "PD" Access By ID
                            -> Bitmap
                                -> Index "AAA_PD_PERIODFROMDATE" Range Scan (lower bound: 1/1)
                    -> Filter
                        -> Table "PAYMENTSH" as "PH" Access By ID
                            -> Bitmap
                                -> Index "PK_PAYMENTSH_MOMENT_GENEMPID" Unique Scan
                -> Filter
                    -> Table "INVOICEH" as "IH" Access By ID
                        -> Bitmap Or
                            -> Bitmap
                                -> Index "PK_INVOICEH_INVID" Unique Scan
                            -> Bitmap
                                -> Index "INVOICEHBYPARENTINVOICEIDDESC" Range Scan (full match)

for FB 5.x and 6.x:

Select Expression
    -> Aggregate
        -> Nested Loop Join (inner)
            -> Filter
                -> Table "INVOICEH" as "IH" Full Scan
            -> Filter
                -> Table "PAYMENTSH" as "PH" Access By ID
                    -> Bitmap Or
                        -> Bitmap
                            -> Index "IDX_PAYMENTSH_MASTERDOC_DESC" Range Scan (full match)
                        -> Bitmap
                            -> Index "IDX_PAYMENTSH_MASTERDOC_DESC" Range Scan (full match)
            -> Filter
                -> Table "PAYMENTSDETAILS" as "PD" Access By ID
                    -> Bitmap
                        -> Index "PK_PD_MOMENT_GENEMPID" Unique Scan

Customer states that second plan leads to performance problem. Additional data from customer:

number of records per tables:

table_name           rec_count
==============================
InvoiceH                953779
PaymentsDetails         783127
PaymentsH             30408945

index statistics (only for those indices that are involved in above shown query):

paymentsdetails
    pk_pd_moment_genempid 0.00000127693203921808163
    aaa_pd_periodfromdate 0.00104058277793228626

paymentsh:
    idx_paymentsh_masterdoc_desc    0.00000011831024692651353
    pk_paymentsh_moment_genempid    0.0000000329129825615837035

invoiceh:
    pk_invoiceh_invid               0.00000104854007076937705
    invoicehbyparentinvoiceiddesc   0.0000217893411900149658

Dec 06 '23 10:12 pavel-zotov

Both LEFT JOINs are converted into INNER here, so the optimizer attempts to evaluate possible table permutations. Condition 0+ih.createempid = 999999 significantly reduces the expected cardinality, thus IH is placed to the first position. How many rows in IH actually have createempid = 999999? What's selectivity of the invoicehbycreateempid index?

Dec 06 '23 11:12 dyemanov

How many rows in IH actually have createempid = 999999?

I've forwarded this question to the customer, waiting for reply.

What's selectivity of the invoicehbycreateempid index?

0.000399680255213752389

Dec 06 '23 11:12 pavel-zotov

It would also be useful to see stats (execution time & fetches) for v4, v5 and v5 without +0 in the last condition.

Dec 06 '23 11:12 dyemanov

Both LEFT JOINs are converted into INNER here, so the optimizer attempts to evaluate possible table permutations. Condition 0+ih.createempid = 999999 significantly reduces the expected cardinality, thus IH is placed to the first position. How many rows in IH actually have createempid = 999999? What's selectivity of the invoicehbycreateempid index?

So the number of rows with CreateEmpID = 999999 at the moment is 771402 and the selectivity is 0.000399680......

with FromDateTime = '01.10.2023' FB5 doesn't return anything in hours (we killed the statement after waiting for more than 6 hours) regardless if there is +0 or not in the last statement. If I remove the whole statement about CreateEmpID in InvoiceH the query runs in milliseconds with the following stats

` Query Time

Prepare : 93.00 ms Execute : 375.00 ms Avg fetch time: 375.00 ms

Memory

Current: 121 341 284 576 Max : 123 660 694 464 Buffers: 7 200 000

Operations

Read : 22 Writes : 231 Fetches: 1 525 479 Marks : 2 917

Enchanced Info: +-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+ | Table Name | Records | Indexed | Non-Indexed | Updates | Deletes | Inserts | Backouts | Purges | Expunges | | | Total | reads | reads | | | | | | | +-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+ |INVOICEH | 0 | 47411 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |PAYMENTSDETAILS | 0 | 45681 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |PAYMENTSH | 0 | 45681 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+ `

And here are the stats for the Query ran in FB4 WITH the statement included ` Query Time

Prepare : 31.00 ms Execute : 515.00 ms Avg fetch time: 515.00 ms

Memory

Current: 4 258 615 968 Max : 4 464 718 112 Buffers: 250 000

Operations

Read : 7 023 Writes : 21 Fetches: 623 164 Marks : 1 864

Enchanced Info: +-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+ | Table Name | Records | Indexed | Non-Indexed | Updates | Deletes | Inserts | Backouts | Purges | Expunges | | | Total | reads | reads | | | | | | | +-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+ |INVOICEH | 0 | 47411 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |PAYMENTSDETAILS | 0 | 45681 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | |PAYMENTSH | 0 | 45681 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +-------------------------------+-----------+-----------+-------------+---------+---------+---------+----------+----------+----------+ `

Dec 06 '23 14:12 dpankov

Тhe question is why is FB5 trying to change the JOINs from LEFT OUTER to INNER ones and is there any way we can force it NOT to do it? We are using these precisely because we know how these queries should go ... or at least we believe so :)

Dec 06 '23 14:12 dpankov

It does it because it allows more optimization abilities (different join orders). As for forcing it to avoid that, see #7910 - it's exactly about that. If you're OK with such a solution (configuration option), it will be added to RC2.

Dec 06 '23 15:12 dyemanov

Yes, this configuration option will be very useful for us and we are really looking forward for you releasing a version that supports it. In the mean time we hope that you'll be able to also optimize the planner to deal with cases like the above one even without it active.

Dec 06 '23 15:12 dpankov

So the number of rows with CreateEmpID = 999999 at the moment is 771402 and the selectivity is 0.000399680......

So you ask for 80% of rows while the optimizer expects less than a thousand to be returned. Such a skewed value distribution is impossible to optimize properly without histograms. Their usage is also limited (only for constant parameters), but at least they provide an option.

Dec 06 '23 16:12 dyemanov

yes, that's absolutely right and that's exactly the reason why we did our best to "hint" the optimizer not to try to use this field/index (by using the 0+ih.CreateEmpID syntax) ... but unluckily it tried to be smarter than what we expected ... and failed badly ;)

Dec 06 '23 20:12 dpankov

In addition to what dimitr said. There are suspicions that in the above example, parentinvoiceid contains many NULLs. The index invoicehbyparentinvoiceiddesc does not contain NULL selectivity. This means that its real selectivity for equality is distorted. Thus, the OR bitmask predicts poor selectivity, which may affect the join order.

Dec 07 '23 06:12 sim1984

You can try to create a partial index to replace the existing one.

drop index invoicehbyparentinvoiceiddesc;

create descending index invoicehbyparentinvoiceiddesc
on invoiceh (parentinvoiceid) where (parentinvoiceid is not null);

Dec 07 '23 06:12 sim1984

You can try to create a partial index to replace the existing one.

drop index invoicehbyparentinvoiceiddesc;

create descending index invoicehbyparentinvoiceiddesc
on invoiceh (parentinvoiceid) where (parentinvoiceid is not null);

I did try it but it doesn't change anything ... and to be honest I don't see how it will since there is no meaningful filtering on this table by ParentInvoiceID and expectedly the plan remains "JOIN (IH NATURAL, PH INDEX ....).

But there is a good news - if we do set the option OuterJoinConversion to false everything returns back to a working state and the query runs in 2.3sec.

I just hope that this comment for the option in the .config file will not materialize

CAUTION!

There is no guarantee that this setting will be available in future Firebird

Dec 20 '23 16:12 dpankov

On the contrary, I hope that it will be. Because this parameter is not a solution to the problem, but a crutch. I would prefer real hints for hints to the optimizer. And it would be even better if he could do without hints at all, but this is from the realm of fiction, so even in DBMS with good optimizers there are hints

Dec 20 '23 16:12 sim1984

Actually, PostgreSQL doesn't allow optimizer hints. If you ask for hints to make the query planner do X, the developers tell you to send them the query instead and they'll make sure the query planner does X from now on! :-)

Jan 08 '24 03:01 duncreg

So far we were doing the same ;-) But it cannot solve all the issues, because the optimizer intelligence is limited, regardless of how good it is.

Jan 08 '24 06:01 dyemanov