server
server copied to clipboard
Support vector ANN search benchmarking
Description
Introduce scripts and Docker file for running the ann-benchmarks
tool, dedicated to vector search performance testing.
-
Offer developers support to run the benchmark in their development environment via existing MariaDB builds or by deploying the source code and executing the benchmark within Docker.
-
Also, integrate these builds into GitLab CI for Ubuntu 22.04 and include ANN benchmarking tests.
For detailed usage instructions, refer to the commit message and script help command.
How can this PR be tested?
Manual test was done for the scripts. The script is also integrated in Git-Lab CI pipeline.
Basing the PR against the correct MariaDB version
- [x] This is a new feature and the PR is based against the latest MariaDB development branch
Backward compatibility
The changes fully backward compatible.
Copyright
All new code of the whole pull request, including one or several files that are either new files or modified ones, are contributed under the BSD-new license. I am contributing on behalf of my employer Amazon Web Services, Inc.
Thank you for your submission! We really appreciate it. Like many open source projects, we ask that you sign our Contributor License Agreement before we can accept your contribution.
You have signed the CLA already but the status is still pending? Let us recheck it.
Test results
Example for a local run with ./support-files/ann-benchmark/run-local.sh
:
Click to expand
wenhug@ud83c070d9ea75a:~/workspace/server$ ./support-files/ann-benchmark/run-local.sh
Downloading ann-benchmark...
Cloning into '/home/ANT.AMAZON.COM/wenhug/workspace/server/ann-workspace/ann-benchmarks'...
remote: Enumerating objects: 237, done.
remote: Counting objects: 100% (237/237), done.
remote: Compressing objects: 100% (214/214), done.
remote: Total 237 (delta 23), reused 152 (delta 18), pack-reused 0
Receiving objects: 100% (237/237), 1.60 MiB | 9.34 MiB/s, done.
Resolving deltas: 100% (23/23), done.
Installing ann-benchmark dependencies...
Starting ann-benchmark...
downloading https://ann-benchmarks.com/random-xs-20-euclidean.hdf5 -> data/random-xs-20-euclidean.hdf5...
Cannot download https://ann-benchmarks.com/random-xs-20-euclidean.hdf5
Creating dataset locally
Splitting 10000*None into train/test
train size: 9000 * 20
test size: 1000 * 20
0/1000...
2024-03-18 11:17:30,522 - annb - INFO - running only mariadb
2024-03-18 11:17:30,526 - annb - INFO - Order: [Definition(algorithm='mariadb', constructor='MariaDB', module='ann_benchmarks.algorithms.mariadb', docker_tag='ann-benchmarks-mariadb', arguments=['euclidean', {'M': 24, 'efConstruction': 200}], query_argument_groups=[[10], [20], [40], [80], [120], [200], [400], [800]], disabled=False), Definition(algorithm='mariadb', constructor='MariaDB', module='ann_benchmarks.algorithms.mariadb', docker_tag='ann-benchmarks-mariadb', arguments=['euclidean', {'M': 16, 'efConstruction': 200}], query_argument_groups=[[10], [20], [40], [80], [120], [200], [400], [800]], disabled=False)]
Trying to instantiate ann_benchmarks.algorithms.mariadb.MariaDB(['euclidean', {'M': 24, 'efConstruction': 200}])
Setup paths:
MARIADB_ROOT_DIR: /home/ANT.AMAZON.COM/wenhug/workspace/server/builddir
DATA_DIR: /home/ANT.AMAZON.COM/wenhug/workspace/server/ann-workspace/mariadb-workspace/data
LOG_FILE: /home/ANT.AMAZON.COM/wenhug/workspace/server/ann-workspace/mariadb-workspace/mariadb.err
SOCKET_FILE: /tmp/mysql_4gl2e5ms.sock
Initialize MariaDB database...
/home/ANT.AMAZON.COM/wenhug/workspace/server/builddir/*/mariadb-install-db --no-defaults --verbose --skip-name-resolve --skip-test-db --datadir=/home/ANT.AMAZON.COM/wenhug/workspace/server/ann-workspace/mariadb-workspace/data --srcdir=/home/ANT.AMAZON.COM/wenhug/workspace/server/support-files/ann-benchmark/../..
mysql.user table already exists!
Run mariadb-upgrade, not mariadb-install-db
Starting MariaDB server...
/home/ANT.AMAZON.COM/wenhug/workspace/server/builddir/*/mariadbd --no-defaults --datadir=/home/ANT.AMAZON.COM/wenhug/workspace/server/ann-workspace/mariadb-workspace/data --log_error=/home/ANT.AMAZON.COM/wenhug/workspace/server/ann-workspace/mariadb-workspace/mariadb.err --socket=/tmp/mysql_4gl2e5ms.sock --skip_networking --skip_grant_tables &
MariaDB server started!
Got a train set of size (9000 * 20)
Got 1000 queries
Preparing database and table...
Inserting data...
Insert time for 180000 records: 0.4894428253173828
Creating index...
Index creation time: 9.5367431640625e-07
Built index in 0.5406086444854736
Index size: 128.0
Running query argument group 1 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 2 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 3 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 4 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 5 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 6 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 7 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 8 of 8...
Run 1/1...
Processed 1000/1000 queries...
Trying to instantiate ann_benchmarks.algorithms.mariadb.MariaDB(['euclidean', {'M': 16, 'efConstruction': 200}])
Setup paths:
MARIADB_ROOT_DIR: /home/ANT.AMAZON.COM/wenhug/workspace/server/builddir
DATA_DIR: /home/ANT.AMAZON.COM/wenhug/workspace/server/ann-workspace/mariadb-workspace/data
LOG_FILE: /home/ANT.AMAZON.COM/wenhug/workspace/server/ann-workspace/mariadb-workspace/mariadb.err
SOCKET_FILE: /tmp/mysql_q1gbgaf3.sock
Initialize MariaDB database...
/home/ANT.AMAZON.COM/wenhug/workspace/server/builddir/*/mariadb-install-db --no-defaults --verbose --skip-name-resolve --skip-test-db --datadir=/home/ANT.AMAZON.COM/wenhug/workspace/server/ann-workspace/mariadb-workspace/data --srcdir=/home/ANT.AMAZON.COM/wenhug/workspace/server/support-files/ann-benchmark/../..
mysql.user table already exists!
Run mariadb-upgrade, not mariadb-install-db
Starting MariaDB server...
/home/ANT.AMAZON.COM/wenhug/workspace/server/builddir/*/mariadbd --no-defaults --datadir=/home/ANT.AMAZON.COM/wenhug/workspace/server/ann-workspace/mariadb-workspace/data --log_error=/home/ANT.AMAZON.COM/wenhug/workspace/server/ann-workspace/mariadb-workspace/mariadb.err --socket=/tmp/mysql_q1gbgaf3.sock --skip_networking --skip_grant_tables &
MariaDB server started!
Got a train set of size (9000 * 20)
Got 1000 queries
Preparing database and table...
Inserting data...
Insert time for 180000 records: 0.4275703430175781
Creating index...
Index creation time: 1.1920928955078125e-06
Built index in 0.48961424827575684
Index size: 0.0
Running query argument group 1 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 2 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 3 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 4 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 5 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 6 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 7 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 8 of 8...
Run 1/1...
Processed 1000/1000 queries...
2024-03-18 11:17:57,147 - annb - INFO - Terminating 1 workers
Ann-benchmark exporting data...
Looking at dataset deep-image-96-angular
Looking at dataset fashion-mnist-784-euclidean
Looking at dataset gist-960-euclidean
Looking at dataset glove-25-angular
Looking at dataset glove-50-angular
Looking at dataset glove-100-angular
Looking at dataset glove-200-angular
Looking at dataset mnist-784-euclidean
Looking at dataset random-xs-20-euclidean
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Looking at dataset random-s-100-euclidean
Looking at dataset random-xs-20-angular
Looking at dataset random-s-100-angular
Looking at dataset random-xs-16-hamming
Looking at dataset random-s-128-hamming
Looking at dataset random-l-256-hamming
Looking at dataset random-s-jaccard
Looking at dataset random-l-jaccard
Looking at dataset sift-128-euclidean
Looking at dataset nytimes-256-angular
Looking at dataset nytimes-16-angular
Looking at dataset word2bits-800-hamming
Looking at dataset lastfm-64-dot
Looking at dataset sift-256-hamming
Looking at dataset kosarak-jaccard
Looking at dataset movielens1m-jaccard
Looking at dataset movielens10m-jaccard
Looking at dataset movielens20m-jaccard
Looking at dataset dbpedia-openai-100k-angular
Looking at dataset dbpedia-openai-200k-angular
Looking at dataset dbpedia-openai-300k-angular
Looking at dataset dbpedia-openai-400k-angular
Looking at dataset dbpedia-openai-500k-angular
Looking at dataset dbpedia-openai-600k-angular
Looking at dataset dbpedia-openai-700k-angular
Looking at dataset dbpedia-openai-800k-angular
Looking at dataset dbpedia-openai-900k-angular
Looking at dataset dbpedia-openai-1000k-angular
Ann-benchmark plotting...
writing output to results/random-xs-20-euclidean.png
Found cached result
0: MariaDB(m=16, ef_construction=200, ef_search=40) 1.000 1007.832
Found cached result
1: MariaDB(m=24, ef_construction=200, ef_search=400) 1.000 941.649
Found cached result
2: MariaDB(m=24, ef_construction=200, ef_search=10) 1.000 1140.663
Found cached result
3: MariaDB(m=24, ef_construction=200, ef_search=20) 1.000 988.373
Found cached result
4: MariaDB(m=24, ef_construction=200, ef_search=120) 1.000 1091.114
Found cached result
5: MariaDB(m=16, ef_construction=200, ef_search=120) 1.000 998.908
Found cached result
6: MariaDB(m=16, ef_construction=200, ef_search=20) 1.000 1021.691
Found cached result
7: MariaDB(m=24, ef_construction=200, ef_search=40) 1.000 823.179
Found cached result
8: MariaDB(m=16, ef_construction=200, ef_search=400) 1.000 1079.078
Found cached result
9: MariaDB(m=24, ef_construction=200, ef_search=800) 1.000 1218.009
Found cached result
10: MariaDB(m=24, ef_construction=200, ef_search=200) 1.000 870.886
Found cached result
11: MariaDB(m=16, ef_construction=200, ef_search=800) 1.000 1058.689
Found cached result
12: MariaDB(m=24, ef_construction=200, ef_search=80) 1.000 851.237
Found cached result
13: MariaDB(m=16, ef_construction=200, ef_search=200) 1.000 930.801
Found cached result
14: MariaDB(m=16, ef_construction=200, ef_search=80) 1.000 1208.318
Found cached result
15: MariaDB(m=16, ef_construction=200, ef_search=10) 1.000 913.258
Ann-benchmark plot done, the last two colunms in above output for 'recall rate' and 'QPS'. ^^^
[COMPLETED]
Example for a local run with ./support-files/ann-benchmark/run-docker.sh
(when doing an incremental build):
Click to expand
wenhug@ud83c070d9ea75a:~/workspace/server$ ./support-files/ann-benchmark/run-docker.sh
Docker image found.
-- Running cmake version 3.22.1
-- MariaDB 11.4.0
-- Updating submodules
-- Could NOT find PkgConfig (missing: PKG_CONFIG_EXECUTABLE)
== Configuring MariaDB Connector/C
-- SYSTEM_LIBS: /usr/lib/x86_64-linux-gnu/libz.so;dl;m;dl;m;/usr/lib/x86_64-linux-gnu/libssl.so;/usr/lib/x86_64-linux-gnu/libcrypto.so;/usr/lib/x86_64-linux-gnu/libz.so
-- Configuring OQGraph
-- Configuring done
-- Generating done
-- Build files have been written to: /build/ann-workspace/builddir
[13/13] Linking CXX executable extra/mariabackup/mariadb-backup
Downloading ann-benchmark...
[WARN] ann-benchmarks repository already exists. Skipping cloning. Remove /build/server/ann-workspace/ann-benchmarks if you want it to be re-initialized.
Installing ann-benchmark dependencies...
WARNING: The directory '/.cache/pip' or its parent directory is not owned or is not writable by the current user. The cache has been disabled. Check the permissions and owner of that directory. If executing pip with sudo, you should use sudo's -H flag.
Starting ann-benchmark...
2024-03-18 18:18:54,384 - annb - INFO - running only mariadb
2024-03-18 18:18:54,393 - annb - INFO - Order: [Definition(algorithm='mariadb', constructor='MariaDB', module='ann_benchmarks.algorithms.mariadb', docker_tag='ann-benchmarks-mariadb', arguments=['euclidean', {'M': 16, 'efConstruction': 200}], query_argument_groups=[[10], [20], [40], [80], [120], [200], [400], [800]], disabled=False), Definition(algorithm='mariadb', constructor='MariaDB', module='ann_benchmarks.algorithms.mariadb', docker_tag='ann-benchmarks-mariadb', arguments=['euclidean', {'M': 24, 'efConstruction': 200}], query_argument_groups=[[10], [20], [40], [80], [120], [200], [400], [800]], disabled=False)]
Trying to instantiate ann_benchmarks.algorithms.mariadb.MariaDB(['euclidean', {'M': 16, 'efConstruction': 200}])
Setup paths:
MARIADB_ROOT_DIR: /build/ann-workspace/builddir
DATA_DIR: /build/server/ann-workspace/mariadb-workspace/data
LOG_FILE: /build/server/ann-workspace/mariadb-workspace/mariadb.err
SOCKET_FILE: /tmp/mysql_4yk6c666.sock
Could not get current user, could be docker user mapping. Ignore.
Initialize MariaDB database...
/build/ann-workspace/builddir/*/mariadb-install-db --no-defaults --verbose --skip-name-resolve --skip-test-db --datadir=/build/server/ann-workspace/mariadb-workspace/data --srcdir=/build/server/support-files/ann-benchmark/../..
mysql.user table already exists!
Run mariadb-upgrade, not mariadb-install-db
Starting MariaDB server...
/build/ann-workspace/builddir/*/mariadbd --no-defaults --datadir=/build/server/ann-workspace/mariadb-workspace/data --log_error=/build/server/ann-workspace/mariadb-workspace/mariadb.err --socket=/tmp/mysql_4yk6c666.sock --skip_networking --skip_grant_tables &
MariaDB server started!
Got a train set of size (9000 * 20)
Got 1000 queries
Preparing database and table...
Inserting data...
Insert time for 180000 records: 0.43891072273254395
Creating index...
Index creation time: 1.1920928955078125e-06
Built index in 0.4922800064086914
Index size: 128.0
Running query argument group 1 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 2 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 3 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 4 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 5 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 6 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 7 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 8 of 8...
Run 1/1...
Processed 1000/1000 queries...
Trying to instantiate ann_benchmarks.algorithms.mariadb.MariaDB(['euclidean', {'M': 24, 'efConstruction': 200}])
Setup paths:
MARIADB_ROOT_DIR: /build/ann-workspace/builddir
DATA_DIR: /build/server/ann-workspace/mariadb-workspace/data
LOG_FILE: /build/server/ann-workspace/mariadb-workspace/mariadb.err
SOCKET_FILE: /tmp/mysql_renlus59.sock
Could not get current user, could be docker user mapping. Ignore.
Initialize MariaDB database...
/build/ann-workspace/builddir/*/mariadb-install-db --no-defaults --verbose --skip-name-resolve --skip-test-db --datadir=/build/server/ann-workspace/mariadb-workspace/data --srcdir=/build/server/support-files/ann-benchmark/../..
mysql.user table already exists!
Run mariadb-upgrade, not mariadb-install-db
Starting MariaDB server...
/build/ann-workspace/builddir/*/mariadbd --no-defaults --datadir=/build/server/ann-workspace/mariadb-workspace/data --log_error=/build/server/ann-workspace/mariadb-workspace/mariadb.err --socket=/tmp/mysql_renlus59.sock --skip_networking --skip_grant_tables &
MariaDB server started!
Got a train set of size (9000 * 20)
Got 1000 queries
Preparing database and table...
Inserting data...
Insert time for 180000 records: 0.3983802795410156
Creating index...
Index creation time: 1.1920928955078125e-06
Built index in 0.4507639408111572
Index size: 0.0
Running query argument group 1 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 2 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 3 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 4 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 5 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 6 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 7 of 8...
Run 1/1...
Processed 1000/1000 queries...
Running query argument group 8 of 8...
Run 1/1...
Processed 1000/1000 queries...
2024-03-18 18:19:22,024 - annb - INFO - Terminating 1 workers
Ann-benchmark exporting data...
Looking at dataset deep-image-96-angular
Looking at dataset fashion-mnist-784-euclidean
Looking at dataset gist-960-euclidean
Looking at dataset glove-25-angular
Looking at dataset glove-50-angular
Looking at dataset glove-100-angular
Looking at dataset glove-200-angular
Looking at dataset mnist-784-euclidean
Looking at dataset random-xs-20-euclidean
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Computing knn metrics
Computing epsilon metrics
Computing epsilon metrics
Computing rel metrics
Looking at dataset random-s-100-euclidean
Looking at dataset random-xs-20-angular
Looking at dataset random-s-100-angular
Looking at dataset random-xs-16-hamming
Looking at dataset random-s-128-hamming
Looking at dataset random-l-256-hamming
Looking at dataset random-s-jaccard
Looking at dataset random-l-jaccard
Looking at dataset sift-128-euclidean
Looking at dataset nytimes-256-angular
Looking at dataset nytimes-16-angular
Looking at dataset word2bits-800-hamming
Looking at dataset lastfm-64-dot
Looking at dataset sift-256-hamming
Looking at dataset kosarak-jaccard
Looking at dataset movielens1m-jaccard
Looking at dataset movielens10m-jaccard
Looking at dataset movielens20m-jaccard
Looking at dataset dbpedia-openai-100k-angular
Looking at dataset dbpedia-openai-200k-angular
Looking at dataset dbpedia-openai-300k-angular
Looking at dataset dbpedia-openai-400k-angular
Looking at dataset dbpedia-openai-500k-angular
Looking at dataset dbpedia-openai-600k-angular
Looking at dataset dbpedia-openai-700k-angular
Looking at dataset dbpedia-openai-800k-angular
Looking at dataset dbpedia-openai-900k-angular
Looking at dataset dbpedia-openai-1000k-angular
Ann-benchmark plotting...
Matplotlib created a temporary config/cache directory at /tmp/matplotlib-tuav14cy because the default path (/.config/matplotlib) is not a writable directory; it is highly recommended to set the MPLCONFIGDIR environment variable to a writable directory, in particular to speed up the import of Matplotlib and to better support multiprocessing.
writing output to results/random-xs-20-euclidean.png
Found cached result
0: MariaDB(m=16, ef_construction=200, ef_search=40) 1.000 841.485
Found cached result
1: MariaDB(m=24, ef_construction=200, ef_search=400) 1.000 896.407
Found cached result
2: MariaDB(m=24, ef_construction=200, ef_search=10) 1.000 827.326
Found cached result
3: MariaDB(m=24, ef_construction=200, ef_search=20) 1.000 875.636
Found cached result
4: MariaDB(m=24, ef_construction=200, ef_search=120) 1.000 877.246
Found cached result
5: MariaDB(m=16, ef_construction=200, ef_search=120) 1.000 843.912
Found cached result
6: MariaDB(m=16, ef_construction=200, ef_search=20) 1.000 844.746
Found cached result
7: MariaDB(m=24, ef_construction=200, ef_search=40) 1.000 1006.725
Found cached result
8: MariaDB(m=16, ef_construction=200, ef_search=400) 1.000 1143.344
Found cached result
9: MariaDB(m=24, ef_construction=200, ef_search=800) 1.000 769.048
Found cached result
10: MariaDB(m=24, ef_construction=200, ef_search=200) 1.000 1011.292
Found cached result
11: MariaDB(m=16, ef_construction=200, ef_search=800) 1.000 938.419
Found cached result
12: MariaDB(m=24, ef_construction=200, ef_search=80) 1.000 972.378
Found cached result
13: MariaDB(m=16, ef_construction=200, ef_search=200) 1.000 839.023
Found cached result
14: MariaDB(m=16, ef_construction=200, ef_search=80) 1.000 798.808
Found cached result
15: MariaDB(m=16, ef_construction=200, ef_search=10) 1.000 912.495
Ann-benchmark plot done, the last two colunms in above output for 'recall rate' and 'QPS'. ^^^
[COMPLETED]
New Git-Lab CI Job passed.
Ignore other failed jobs as the development branch does not build for some plugins:
@HugoWenTD , could you add support for --batch
? I tried, like
diff --git a/ann_benchmarks/algorithms/mariadb/module.py b/ann_benchmarks/algorithms/mariadb/module.py
index 382ea70..89efce1 100644
--- a/ann_benchmarks/algorithms/mariadb/module.py
+++ b/ann_benchmarks/algorithms/mariadb/module.py
@@ -8,6 +8,7 @@ import subprocess
import sys
import tempfile
import time
+import threading
import mariadb
@@ -25,7 +26,7 @@ class MariaDB(BaseANN):
self._test_time = time.strftime("%Y-%m-%d-%H-%M-%S", time.localtime())
self._metric = metric
self._m = method_param['M']
- self._cur = None
+ self._ = threading.local()
self._perf_proc = None
self._perf_records = []
self._perf_stats = []
@@ -45,7 +46,7 @@ class MariaDB(BaseANN):
# Connect to MariaDB using Unix socket
conn = mariadb.connect(unix_socket=self._socket_file)
- self._cur = conn.cursor()
+ self._.cur = conn.cursor()
def prepare_options(self):
self._perf_stat = os.environ.get('PERF', 'no') == 'yes' and MariaDB.can_run_perf()
@@ -247,15 +248,15 @@ class MariaDB(BaseANN):
def fit(self, X):
# Prepare database and table
print("\nPreparing database and table...")
- self._cur.execute("DROP DATABASE IF EXISTS ann")
- self._cur.execute("CREATE DATABASE ann")
- self._cur.execute("USE ann")
- self._cur.execute("SET mhnsw_max_edges_per_node = %d" % self._m)
- self._cur.execute("SET rand_seed1=1, rand_seed2=2")
+ self._.cur.execute("DROP DATABASE IF EXISTS ann")
+ self._.cur.execute("CREATE DATABASE ann")
+ self._.cur.execute("USE ann")
+ self._.cur.execute("SET mhnsw_max_edges_per_node = %d" % self._m)
+ self._.cur.execute("SET rand_seed1=1, rand_seed2=2")
# Innodb create table with index is not supported with the latest commit of the develop branch.
# Once all supported we could use:
- #self._cur.execute("CREATE TABLE t1 (id INT PRIMARY KEY, v BLOB NOT NULL, vector INDEX (v)) ENGINE=InnoDB;")
- self._cur.execute("CREATE TABLE t1 (id INT PRIMARY KEY, v BLOB NOT NULL, vector INDEX (v)) ENGINE=MyISAM;")
+ #self._.cur.execute("CREATE TABLE t1 (id INT PRIMARY KEY, v BLOB NOT NULL, vector INDEX (v)) ENGINE=InnoDB;")
+ self._.cur.execute("CREATE TABLE t1 (id INT PRIMARY KEY, v BLOB NOT NULL, vector INDEX (v)) ENGINE=MyISAM;")
# Insert data
print("\nInserting data...")
@@ -263,11 +264,11 @@ class MariaDB(BaseANN):
start_time = time.time()
rps = 10000
for i, embedding in enumerate(X):
- self._cur.execute("INSERT INTO t1 (id, v) VALUES (%d, %s)", (i, bytes(vector_to_hex(embedding))))
+ self._.cur.execute("INSERT INTO t1 (id, v) VALUES (%d, %s)", (i, bytes(vector_to_hex(embedding))))
if i % int(rps + 1) == 1:
rps=i/(time.time()-start_time)
print(f"{i:6d} of {len(X)}, {rps:4.2f} stmt/sec, ETA {(len(X)-i)/rps:.0f} sec")
- self._cur.execute("commit")
+ self._.cur.execute("commit")
self.perf_stop()
print(f"\nInsert time for {X.size} records: {time.time() - start_time:7.2f}")
@@ -280,7 +281,7 @@ class MariaDB(BaseANN):
elif self._metric == "euclidean":
# The feature is being developed
# Currently stack will be empty for indexing in perf data as nothing is executed
- #self._cur.execute("ALTER TABLE `t1` ADD VECTOR INDEX (v);")
+ #self._.cur.execute("ALTER TABLE `t1` ADD VECTOR INDEX (v);")
pass
else:
pass
@@ -292,25 +293,32 @@ class MariaDB(BaseANN):
def set_query_arguments(self, ef_search):
# Set ef_search
self._ef_search = ef_search
- self._cur.execute("SET mhnsw_limit_multiplier = %d/10" % ef_search)
+ self._.cur.execute("SET mhnsw_limit_multiplier = %d/10" % ef_search)
def query(self, v, n):
- self._cur.execute("SELECT id FROM t1 ORDER by vec_distance(v, %s) LIMIT %d", (bytes(vector_to_hex(v)), n))
- return [id for id, in self._cur.fetchall()]
+ if not hasattr(self._, 'cur'):
+ conn = mariadb.connect(unix_socket=self._socket_file)
+ self._.cur = conn.cursor()
+ self._.cur.execute("USE ann")
+ self._.cur.execute("SET mhnsw_limit_multiplier = %d/10" % self._ef_search)
+ self._.cur.execute("SET rand_seed1=13, rand_seed2=29")
+
+ self._.cur.execute("SELECT id FROM t1 ORDER by vec_distance(v, %s) LIMIT %d", (bytes(vector_to_hex(v)), n))
+ return [id for id, in self._.cur.fetchall()]
# TODO for MariaDB, get the memory usage when index is supported:
# def get_memory_usage(self):
- # if self._cur is None:
+ # if self._.cur is None:
# return 0
- # self._cur.execute("")
- # return self._cur.fetchone()[0] / 1024
+ # self._.cur.execute("")
+ # return self._.cur.fetchone()[0] / 1024
def __str__(self):
return f"MariaDB(m={self._m:2d}, ef_search={self._ef_search})"
def done(self):
# Shutdown MariaDB server when benchmarking done
- self._cur.execute("shutdown")
+ self._.cur.execute("shutdown")
# Stop perf for searching and do final analysis
self.perf_stop()
self.perf_analysis()
that works, but if you run ../build/client/mariadb-admin --socket /tmp/mysql_*.sock processlist -i1
while the benchmark is running, you'll see many connections in the server, but only one of them — at most — is running the query. Python, indeed, creates os.cpu_count()
threads, but they don't run in parallel.
I've managed to make it work with Pool
(default BaseANN.batch_query
uses ThreadPool
) but it looks quite awful
, but they don't run in parallel.
@vuvova I think might be related to the ann-benchmark framework, I'll investigate it further once I have some time from other tasks.
See https://github.com/vuvova/ann-benchmarks/commits/dev/