arcadedb HNSW Vector Index Performance Issue with Batched Transactions

Problem

Populating an HNSW vector index using batched transactions causes severe performance degradation (5-10x slower) compared to a single large transaction.

Performance Comparison:

Single transaction: ~278 vectors/sec (stable)
Batched transactions: 781 → 149 → 97 → 64 vectors/sec (progressive slowdown)

Root Cause

Each transaction commit forces:

Disk I/O for index metadata
Page cache invalidation
HNSW graph state persistence/reload

Single transactions keep all HNSW updates in memory until final commit, avoiding repeated disk flushes.

Reproducible Example

Dataset: 9,742 vectors (384 dimensions)

// FAST: Single transaction (35.0s)
db.begin();
for (Vertex v : vertices) { index.add(v); }
db.commit();

// SLOW: Batched transactions (5-10x slower)
for (batch : batches) {
  db.begin();
  for (Vertex v : batch) { index.add(v); }
  db.commit();
}

Impact

Bulk indexing: Re-indexing takes 5-10x longer with batching
Memory vs speed trade-off: Users batch to avoid OOM but performance becomes unusable
Scalability: Slowdown is exponential with dataset size

Environment

ArcadeDB 24.11.1
HNSW: m=16, ef=128
JVM: 8GB heap

Nov 10 '25 11:11 tae898

@tae898 could you please try the same test with the new LSM Vector?

Dec 09 '25 18:12 lvca

@tae898 could you please try the same test with the new LSM Vector?

yes I will! excited for the new vector implementation.

Dec 09 '25 21:12 tae898

HNSW Vector Index Performance Issue with Batched Transactions

Problem

Root Cause

Reproducible Example

Impact

Suggested Solutions

Option 1: Buffer HNSW updates in memory

Option 2: Transaction-aware optimization

Option 3: Document the requirement

Environment