firebird icon indicating copy to clipboard operation
firebird copied to clipboard

Bitmap Join Indexes

Open arkinform opened this issue 7 months ago • 12 comments

Sometimes it is necessary to use not exists conditions on big tables. At present, besides common b-tree indexes, Firebird does not provide any efficient way for optimizing such queries:

select l.changelog_id from changelog l
where not exists (select * from changelogdet ld on ld.changelog_id = l.changelog_id and ld.target_id = :target_id)

Most relational databases also do not provide any special optimizations for such cases, but Oracle does. There is a special type of index called "Bitmap Join Index":

CREATE BITMAP INDEX idx_changelog_not_exists
ON changelog(ld.target_id)
FROM changelog l, changelogdet ld
WHERE ld.changelog_id = l.changelog_id;

In case of low cardinality in "target_id" values this index can be efficiently used to optimize "not exists" conditions without scanning b-tree index on the "changelogdet" table which can be much bigger than "changelog" table itself.

arkinform avatar Jun 01 '25 09:06 arkinform

In firebird, indexes have been able to build bitmaps since birth, effectively combine and connect them through AND and OR. So this ticket is off topic. The reason for not using effective not exists is completely different. Namely, the inability to transform not exists into anti join, but work is already underway on this.

sim1984 avatar Jun 01 '25 10:06 sim1984

I would also like to add that Only Index Scan could help. In any case, our indexes can do bitmap scanning initially.

sim1984 avatar Jun 01 '25 10:06 sim1984

In firebird, indexes have been able to build bitmaps since birth

It is not such kind of bitmap. But I doubt that such indexes are more effective than ordinary B+ tree and definitely they have high maintenance cost.

aafemt avatar Jun 01 '25 10:06 aafemt

I mean, the author doesn't understand the reasons why this query isn't working effectively. I understand that our better indexes are not a complete analogue of the oracle bitmap index, but they completely solve the same tasks with no less efficiency. We can't scan just the index without the table, which makes it less efficient.

sim1984 avatar Jun 01 '25 10:06 sim1984

I mean, the author doesn't understand the reasons why this query isn't working effectively. I understand that our better indexes are not a complete analogue of the oracle bitmap index, but they completely solve the same tasks with no less efficiency. We can't scan just the index without the table, which makes it less efficient.

@sim1984 I am talking about "join index", not about just "bitmap" index. It contains already joined state and you don't need to join 2 tables again by "changelog_id" executing select statement. That is the main point.

arkinform avatar Jun 01 '25 11:06 arkinform

Do you have an idea how such index can be implemented?

aafemt avatar Jun 01 '25 11:06 aafemt

What do you mean be better index?

SELECT 
D.DYR_ID
FROM 
DYREKCJA D  
WHERE NOT EXISTS(SELECT * FROM ROZLICZENIE RL WHERE RL.DYR_ID=D.DYR_ID)

PLAN (RL INDEX (FK_ROZLICZENIE__DYREKCJA)) PLAN (D NATURAL)

stats - on remote machine:

Executing statement...
Statement executed (elapsed time: 0.000s).
2346 fetches, 0 marks, 0 reads, 0 writes.
0 inserts, 0 updates, 0 deletes, 8 index, 28 seq.
Delta memory: 1639904 bytes.
ROZLICZENIE: 8 reads index. 
DYREKCJA: 28 reads sequence. 
Total execution time: 0.300s
Script execution finished.

quite different approach:

SELECT 
D.DYR_ID
FROM 
DYREKCJA D 
LEFT JOIN LATERAL(
SELECT 
FIRST 1 RL.DYR_ID FROM ROZLICZENIE RL WHERE RL.DYR_ID=D.DYR_ID
) RL2 ON 1=1
where 
RL2.DYR_ID IS NULL

PLAN JOIN (D NATURAL, RL2 RL INDEX (FK_ROZLICZENIE__DYREKCJA))

Executing statement...
Statement executed (elapsed time: 0.000s).
2382 fetches, 0 marks, 0 reads, 0 writes.
0 inserts, 0 updates, 0 deletes, 36 index, 28 seq.
Delta memory: 1652112 bytes.
RDB$INDICES: 28 reads index. 
ROZLICZENIE: 8 reads index. 
DYREKCJA: 28 reads sequence. 
Total execution time: 0.383s
Script execution finished.

What do you need to optimize here @arkinform? It is quite optimal now.

livius2 avatar Jun 01 '25 20:06 livius2

Sorry, I didn't see right away that this is a special type of index for joining tables, and not just a Bitmap Index. In any case, I think that this type of index is quite specific. Its support is labor-intensive, at least it will be twice as heavy as an index for one table. And we still have reserves for accelerating this type of query with other methods that are currently being developed. But in general, in the distant future, this functionality can be made.

sim1984 avatar Jun 02 '25 06:06 sim1984

@livius2 in reply to your message:

What do you mean be better index?...

Add 1 million records to the "changelog", then add 10 millions records to the "changelogdet" (1 million for each of the 10 different "target_id"). After that add 1 new record to the "changelog" and try to find this new record using "exists". You will see the difference.

Of course in some case there are ways to optimize database structure (for example, try to use some extra "status" fields in the main "changelog" table, archive old data, etc.), but it is not always easy to implement, especially in the concurrent environment.

arkinform avatar Jun 02 '25 10:06 arkinform

Exactly because of this it is not easy to implement at Firebird engine level as well.

aafemt avatar Jun 02 '25 10:06 aafemt

@livius2 in reply to your message:

What do you mean be better index?...

Add 1 million records to the "changelog", then add 10 millions records to the "changelogdet" (1 million for each of the 10 different "target_id"). After that add 1 new record to the "changelog" and try to find this new record using "exists". You will see the difference.

Of course in some case there are ways to optimize database structure (for example, try to use some extra "status" fields in the main "changelog" table, archive old data, etc.), but it is not always easy to implement, especially in the concurrent environment.

Without aggregation on changelog table it is unsolvable, the same is when you do with 2 decks of cards, where you take one card from the second. It's not exactly the same case, but, it's easy to check what you have to do to find the missing one ;-) Iterate by all values and remove matched.

livius2 avatar Jun 02 '25 16:06 livius2

In the original message, I suppose reads from changelogdet can be reduced using hash anti-join for NOT EXISTS, although this is not going to optimize reads from changelog. It would be interesting to see the effect anyway before considering more complex solutions.

dyemanov avatar Sep 01 '25 12:09 dyemanov