firebird icon indicating copy to clipboard operation
firebird copied to clipboard

Slow inequal join, affects all FB3 and FB4

Open zedalaye opened this issue 2 years ago • 4 comments

Given:

create domain IDENTIFIANT as INTEGER NOT NULL;
create domain UNIQUEID as CHAR(16) CHARACTER SET OCTETS;
create domain DATEHEURE as TIMESTAMP DEFAULT 'NOW' NOT NULL;
create domain TEXTE as BLOB SUB_TYPE TEXT;
create domain PERIOD as CHAR(1) NOT NULL CHECK (value in ('J', 'H', 'M', 'T', 'S', 'A', 'E'));

create table NAMES (
 CREATED_AT DATEHEURE,
 UID        UNIQUEID NOT NULL,
 NAME       TEXTE,
 NUMBER     INTEGER,
 CONSTRAINT NAMES_PK PRIMARY KEY (UID, CREATED_AT)
);

CREATE SEQUENCE GEN_PERIOD_ID;

CREATE TABLE PERIODS
(
  ID          IDENTIFIANT,
  STORE       UNIQUEID,
  DEVICE      UNIQUEID,
  PERIOD_TYPE PERIOD,
  STARTING_AT DATEHEURE,
  ENDING_AT   DATEHEURE,
  CONSTRAINT PERIODS_PK PRIMARY KEY (ID)
);

CREATE OR ALTER TRIGGER BI_PERIODS_ID FOR PERIODS
ACTIVE BEFORE INSERT POSITION 0
AS
begin
  if (new.ID is NULL) then
    new.ID = gen_id(GEN_PERIOD_ID, 1);
end;

With table NAMES filled with ~15000 duplicates (due to a bug in my software, but that's another problem) and PERIODS with ~1000 records, this query is particularly slow:

select
 ID, PERIOD_TYPE, STARTING_AT, ENDING_AT,
 STORE, ns.NAME as STORE_NAME,
 DEVICE, nd.NAME as DEVICE_NAME, nd.NUMBER as DEVICE_NUMBER
from PERIODS
left outer join NAMES as ns on (ns.UID = STORE)
left outer join NAMES as nd on (nd.UID = DEVICE)
where (PERIOD_TYPE='J') and not exists (
 select * from NAMES ns2
 where ns2.UID = ns.UID and ns2.CREATED_AT > ns.CREATED_AT
) and not exists (
 select * from NAMES nd2
 where nd2.UID = nd.UID and nd2.CREATED_AT > nd.CREATED_AT
)

It returns around 1 line every 5 second. Despites using what seems to be a good enough plan:

PLAN (NS2 INDEX (NAMES_PK))
PLAN (ND2 INDEX (NAMES_PK))
PLAN JOIN (JOIN (PERIODS NATURAL, NS INDEX (NAMES_PK)), ND INDEX (NAMES_PK))

This query is way faster but sometimes hangs after returning the last record.

with NUMBERED_NAMES as (
 select
  UID, CREATED_AT, NAME, NUMBER,
  row_number() over (partition by UID order by CREATED_AT desc) as rn
 from
  NAMES
), LAST_NAMES as (
 select
  UID, CREATED_AT, NAME, NUMBER
 from
  NUMBERED_NAMES
 where
  rn=1
)
select
 ID, PERIOD_TYPE, STARTING_AT, ENDING_AT,
 STORE, ns.NAME as STORE_NAME,
 DEVICE, nd.NAME as DEVICE_NAME, nd.NUMBER as DEVICE_NUMBER
from PERIODS
join LAST_NAMES ns on (ns.UID = STORE)
join LAST_NAMES nd on (nd.UID = DEVICE)
where PERIOD_TYPE='J'

The plan for this request is very different :

PLAN HASH (HASH (PERIODS NATURAL, SORT (ND NUMBERED_NAMES NAMES NATURAL)), SORT (NS NUMBERED_NAMES NAMES NATURAL))

I attached the Delphi Code (using UIB) and the database used to reproduce this behavior. Firebird server is not started (instsvc stop and instsvc r) nor registered (instreg r) so I just have to change the path to fbclient.dll. I tested this code against all Firebird 3.0.x releases and Firebird 4.0:

D := TUIBDataBase.Create(nil);
try
  D.LibraryName := 'C:\Program Files (x86)\Firebird\3.0.8\fbclient.dll';
  D.DatabaseName := 'test.fdb';
  ....
finally
  D.Free;
end

I get the exact same results with all FB3 and FB4 versions I tested.

test_code.zip test_db.zip

zedalaye avatar Dec 20 '21 19:12 zedalaye

The CTE version can be simplified to:

with NUMBERED_NAMES as (
 select
  UID, CREATED_AT, NAME, NUMBER,
  row_number() over (partition by UID order by CREATED_AT desc) as rn
 from
  NAMES
)
select
 ID, PERIOD_TYPE, STARTING_AT, ENDING_AT,
 STORE, ns.NAME as STORE_NAME,
 DEVICE, nd.NAME as DEVICE_NAME, nd.NUMBER as DEVICE_NUMBER
from PERIODS
join NUMBERED_NAMES ns on (ns.UID = STORE and ns.rn = 1)
join NUMBERED_NAMES nd on (nd.UID = DEVICE and nd.rn = 1)
where PERIOD_TYPE='J'

zedalaye avatar Dec 20 '21 19:12 zedalaye

I don't understand why, but the original, non translated version of this query crashes the connection and nothing is logged in firebird.log...

with NUMBERED_NAMES as (
 select
  UID, NOM, NUMERO,
  row_number() over (partition by UID order by CREATION desc) as rn
 from
  NOMS
)
select
 ID, PRECEDENTE,
 TYPE_PERIODE, DEBUT, FIN, CLOTURE, ARCHIVE,
 MAGASIN, NM.NOM as NOM_MAGASIN,
 CAISSE, NC.NOM as NOM_CAISSE, NC.NUMERO as NUMERO_CAISSE
from PERIODES
left outer join NUMBERED_NAMES as NM on (NM.UID = MAGASIN and NM.RN=1)
left outer join NUMBERED_NAMES as NC on (NC.UID = CAISSE and NC.RN=1)
where (TYPE_PERIODE='J') 
  and (MAGASIN is not null) and (CAISSE is not null)
order by DEBUT desc

zedalaye avatar Dec 20 '21 20:12 zedalaye

I don't understand why, but the original, non translated version of this query crashes the connection and nothing is logged in firebird.log...

Does the crash happen on both FB v3.0 and v4.0? Obviously, I cannot reproduce it because tables are different from your test case.

dyemanov avatar Dec 22 '21 17:12 dyemanov

select
 ID, PERIOD_TYPE, STARTING_AT, ENDING_AT,
 STORE, ns.NAME as STORE_NAME,
 DEVICE, nd.NAME as DEVICE_NAME, nd.NUMBER as DEVICE_NUMBER
from PERIODS
left outer join NAMES as ns on (ns.UID = STORE)
left outer join NAMES as nd on (nd.UID = DEVICE)
where (PERIOD_TYPE='J') and not exists (
 select * from NAMES ns2
 where ns2.UID = ns.UID and ns2.CREATED_AT > ns.CREATED_AT
) and not exists (
 select * from NAMES nd2
 where nd2.UID = nd.UID and nd2.CREATED_AT > nd.CREATED_AT
)

It returns around 1 line every 5 second. Despites using what seems to be a good enough plan:

PLAN (NS2 INDEX (NAMES_PK))
PLAN (ND2 INDEX (NAMES_PK))
PLAN JOIN (JOIN (PERIODS NATURAL, NS INDEX (NAMES_PK)), ND INDEX (NAMES_PK))

The plan is not so good if you'd look at the explained format:

Select Expression
    -> Filter
        -> Table "NAMES" as "NS2" Access By ID
            -> Bitmap
                -> Index "NAMES_PK" Range Scan (lower bound: 2/2, upper bound: 1/2)
Select Expression
    -> Filter
        -> Table "NAMES" as "ND2" Access By ID
            -> Bitmap
                -> Index "NAMES_PK" Range Scan (lower bound: 2/2, upper bound: 1/2)
Select Expression
    -> Filter
        -> Nested Loop Join (outer)
            -> Nested Loop Join (outer)
                -> Filter
                    -> Table "PERIODS" Full Scan
                -> Filter
                    -> Table "NAMES" as "NS" Access By ID
                        -> Bitmap
                            -> Index "NAMES_PK" Range Scan (partial match: 1/2)
            -> Filter
                -> Table "NAMES" as "ND" Access By ID
                    -> Bitmap
                        -> Index "NAMES_PK" Range Scan (partial match: 1/2)

Note that all index lookups are range scans. You initially select much more records than really needed (match for UID = D6668D401BD74630AEBB7A9649848E52 returns 18145 records and join is done twice) and for every record two additional NOT EXISTS lookups are performed. Very expensive.

Your CTE version is good but not really equivalent, as you replaced LEFT JOIN with INNER JOIN. If rewritten with LEFT JOIN it becomes much slower.

Another option (in FB4) could be to use lateral joins:

select
 ID, PERIOD_TYPE, STARTING_AT, ENDING_AT,
 STORE, ns.NAME as STORE_NAME,
 DEVICE, nd.NAME as DEVICE_NAME, nd.NUMBER as DEVICE_NUMBER
from PERIODS
left join lateral (select first 1 NAME from NAMES where UID = STORE order by CREATED_AT DESC ) as ns on true
left join lateral (select first 1 NAME, NUMBER from NAMES where UID = DEVICE order by CREATED_AT DESC ) as nd on true
where PERIOD_TYPE='J';

It's slower than CTE for INNER JOIN but faster for LEFT JOIN. However, if the index on (UID, CREATED_AT) in NAMES is created as descending, then the lateral join option becomes simply the best.

dyemanov avatar Dec 22 '21 18:12 dyemanov