firebird icon indicating copy to clipboard operation
firebird copied to clipboard

Transform LEFT joins into INNER ones if the WHERE condition violates the outer join rules

Open dyemanov opened this issue 4 years ago • 13 comments

Example:

SELECT *
FROM T1 LEFT JOIN T2 ON T1.ID = T2.ID
WHERE T2.FIELD1 = 0

In this case the condition T2.FIELD1 = 0 effectively removes all the "fake NULL" rows of T2, so the result is the same as for the INNER JOIN. However, the optimizer is forced to use the T1->T2 join order while T2->T1 could also be considered. Also, the optimizer cannot use hash/merge algorithms for joining. It makes sense to detect this case during join processing and internally replace LEFT with INNER before optimization starts.

This is primarily intended to improve "ad hoc" and machine-generated (e.g. ORM) queries. However, it may also somewhat (from the performance POV) break user applications, as many people used to "hint" the optimizer by writing LEFT intentionally, exactly to "pin" the manually chosen join order instead of relying on the optimizer. Usually, the join is performed by the PK/FK reference and the fake WHERE condition explicitly checks that PK (which is known to never be NULL by its own) to produce the INNER result.

The proposed improvement, however, may work around this compatibility issue. In particular, checks for NULL, e.g. WHERE T2.ID IS NOT NULL for the join example above, would not transform LEFT into INNER, because theoretically they may also be used for checking fake NULLs inside T2 after LEFT JOIN. However, regular comparisons that ignore NULLs by their nature, will cause the LEFT->INNER transformation. So only hints using very artificial checks like T2.ID > 0 will be affected.

This patch has been tested in production for a while and nobody stepped on problems yet. However, in order to minimize possible risks, perhaps it should be added to the major release (v5) only, or we could provide a per-database compatibility switch in firebird.conf.

dyemanov avatar Oct 06 '21 06:10 dyemanov

Hi Dmitry!

I recommend that this could be a per-database compatibility switch in firebird.conf, and turn off as default setting. We intentionally use this method in places where Firebird uses a bad plan for very complex querys. It would be a serious job to explore all the places where we use this solution. And we don't know for sure whether every colleague always uses a primary key or unique key field in the where condition to check not null.

András

omachtandras avatar Oct 06 '21 07:10 omachtandras

In my opinion, this should not be off by default, new features or improvements that are off by default don't get used.

mrotteveel avatar Oct 06 '21 14:10 mrotteveel

Per-database options is a huge pain for support. In the major release it would be better to register cases of generated bad plans as bugs for separate optimizer improvements.

aafemt avatar Oct 06 '21 15:10 aafemt

@mrotteveel : We have clients who do not have physical access to the firebird config files (or firebird is used by other applications with different needs). Databases are created dynamically using our program without modifying and fierbird config files... If this behaviour can be modified in an on connect trigger, that is a sufficient solution too.

@aafemt : we have not reported these cases so far because

  • it takes too much time a test database
  • the plan seems logical (analyzing the index statistics), but
  • cause a significant record reading surplus, leading to a serious slowdown

Nevertheless, if such an anomaly occurs again, I promise we will try to bring it into a reportable format.

omachtandras avatar Oct 06 '21 15:10 omachtandras

@omachtandras, it's not really a question whether PK/UK field is used or some other fields. If that field is not nullable and you check it via FIELD IS NOT NULL -- everything is OK, nothing will be changed for you. If you use some other conditions (like FIELD > 0 mentioned in this ticket), then plans may change. Can you check that?

dyemanov avatar Oct 06 '21 15:10 dyemanov

@dyemanov : thanks for the clarification. For sure, we always use a not nullable field with field is not null in the where conditions.

omachtandras avatar Oct 06 '21 16:10 omachtandras

I think it's worth adding that verification (that IS NOT NULL does not break plans) to our QA tests if/when this improvement is committed.

dyemanov avatar Oct 06 '21 16:10 dyemanov

@omachtandras , @aafemt this is the problematic case for us : https://github.com/FirebirdSQL/firebird/issues/1476 In this case we use the LEFT OUTER workaround.

EPluribusUnum avatar Oct 07 '21 07:10 EPluribusUnum

If I understand correctly, it can be disabled like this

SET QUERY TRANSFORMATION OFF

in the BEFORE CONNECT trigger or just before the query is executed

sim1984 avatar Oct 07 '21 09:10 sim1984

Another MSSQL-style option is the last thing Firebird needs IMHO.

aafemt avatar Oct 07 '21 09:10 aafemt

@aafemt I agree, but the globally disabling of some optimizer algorithms would also not hurt, since it is not always possible to rewrite the query in such a way as to assign hints to it.

sim1984 avatar Oct 07 '21 09:10 sim1984

If you want hints - implement hints. Or use already existing PLAN clause.

aafemt avatar Oct 07 '21 10:10 aafemt

I vote for hints and for global switches. The PLAN clause is too limited, and in my opinion, even harmful.

sim1984 avatar Oct 07 '21 10:10 sim1984