[Bug]: vector index benchmark performance is lower than pgvector
Is there an existing issue for the same bug?
- [X] I have checked the existing issues.
Branch Name
main
Commit ID
b85d7579eec4ca041b63ecf8cf9ceef325f59071
Other Environment Information
- Hardware parameters:
- OS type:
- Others:
Actual Behavior
mo vs pgvector vector index benchmark test result:
and also see file:https://doc.weixin.qq.com/sheet/e3_AYYAgwazACsB4L72hnoQ1eAEU31N6?scode=AJsA6gc3AA8QPAY9JuAYYAgwazACs
from the test data, the following performance issues have been summarized:
-
Comparison of query performance with and without index: mo almost has no difference QPS in sift 128dim 1million query with and without index, but there is a 6 times difference in PGvector between the two; The indexed performance of mo 960dim is significantly better than that without indexes
-
QPS performance comparison: sift 128dim and gist960 1million, Mo recall is basically same as pgvector, but QPS is 30-40 times lower than pgvector
-
Performance impact of different lists: Mo index recall which lists=500 is lower than lists=1000, but there is no significant improvement in QPS. However, pgvector QPS which lists=1000 is significantly better than lists=500
-
Index creation performance: Sift 128 1million mo takes 10 times longer to create an index than pgvector; Gist 960 1million mo takes 25 times longer to create an index than pgvector
Expected Behavior
No response
Steps to Reproduce
1.use benchmark tool: git clone -b sift128 https://github.com/arjunsk/mo-benchmark-test.git
2.run benchmark tool with sift128dim and gist960dim dataset
Additional information
No response
Currently tracked in https://github.com/matrixorigin/matrixone/issues/14610 . Will update this issue once the provided issue is resolved.
No progress.
-
For KNN QPS: //OK Waiting for this issue to be resolved: https://github.com/matrixorigin/matrixone/issues/15196 Once that is fixed, we should get around 50-60 QPS for SIFT128
-
Create Index duration // Needs work Made some modifications to bring down the duration to 5-6 mins for SIFT128. Doing some more analysis.
-
Insert before the Index is Created // OK Inserts SIFT128 data in 330 secs.
-
Insert after Index is Created // Needs work This benchmark is required. Based on my local benchmark we take 30 minutes to reinsert the same SIFT128 data after the index is created for the table. Need more analysis.
KNN can be improved after solving this: https://github.com/matrixorigin/matrixone/issues/15572
Create Index is improved after merging this: https://github.com/matrixorigin/matrixone/pull/15573
Note: Please take the latest pull of https://github.com/arjunsk/mo-benchmark-test/tree/master_index
- SIFT 128 dataset
- With PK
| Load (1million) without INDEX | 30sec |
|---|---|
| Create Index List = 500 | 3min 41 sec |
| KNN QPS (k = 5) with INDEX | 52 |
| KNN recall (k=5) with INDEX | 0.7013 |
| Reinsert (2million) with INDEX | 2799 secs |
| Insert (1million) without INDEX | 1292 secs |
- Without PK
| Load (1million) without INDEX | 30sec |
|---|---|
| Create Index List = 500 | 3mins 59sec |
| KNN QPS (k = 5) with INDEX | 32 |
| KNN recall (k=5) with INDEX | 0.7162 |
| Reinsert (2million) with INDEX | 2504secs |
| Insert (1million) without INDEX | 338 secs |
Hi @heni02 , most of the optimizations have been added to the master. Kindly verify the performance change.
after vector index improvement test main commit:ad5d8c6c43a021760896df846fcf35dff93cfd8f performance results:
- good news: Compared to previous versions, there has been a significant improvement in index creation and QPS, about more than 1-29 times
- small bad news: Compared to PG, index creation performance is 3-10 times lower, and Gist960 QPS is 2-10 times lower
performance result:https://doc.weixin.qq.com/sheet/e3_AYYAgwazACsB4L72hnoQ1eAEU31N6?scode=AJsA6gc3AA8QPAY9JuAYYAgwazACs
For GIST960, the performance is bad. Arjun have created an issue for tracking the same: https://github.com/matrixorigin/matrixone/issues/16001
Performance is optimizating, move to the next version
pgvector version update to 0.7.2(postgres server version 14.4) ,retest pgvector benchmark performance, the newest version is better performance than previous 0.4.2 version ,see the results :
https://doc.weixin.qq.com/sheet/e3_AYYAgwazACsB4L72hnoQ1eAEU31N6?scode=AJsA6gc3AA8QPAY9JuAYYAgwazACs&tab=jcbr8o
pgvector 0.7.2 download: https://pgxn.org/dist/vector/#query-options
postgres server version: