objectbox-dart icon indicating copy to clipboard operation
objectbox-dart copied to clipboard

Very slow deletion performance of removeMany with HNSW vector index

Open Ohrest88 opened this issue 10 months ago • 2 comments

First of all, thank you for this great project! I have searched the issues but I couldn't find any issue related to this.

Description

In my Flutter application, I have an objectbox entity defined as:

@Entity()
class DocumentSection {
  @Id()
  int id = 0;

  final document = ToOne<Document>();
  String content;
  
  @Property(type: PropertyType.int)
  int pageNumber;
  
  @HnswIndex(
    dimensions: 500,
    distanceType: VectorDistanceType.cosine
  )
  @Property(type: PropertyType.floatVector)
  List<double>? embedding;

  @Property(type: PropertyType.int)
  int originalId = 0;

  DocumentSection({
    this.content = '',
    this.embedding,
    this.pageNumber = 0,
  });
}

It's for a semantic search use-case. The objectbox DB has 109000 entries for DocumentSection (therefore 109000 vectors).

While the performance of vector search is remarkably fast with that number of vectors (For example less than 1 second for nearestNeighborsF32 to return a result with 20 nearest embeddings), deleting entries is very slow:

Taking about 264 seconds (4.4 minutes) to delete 22,085 entries (out of 109000 total entries). Could the reason for this be related to the management of the HNSW vector index during the removeMany operation?

This is the code I'm using to delete entries:

  void _deleteDocument(Document document) {
    try {
      debugPrint('Starting deletion of document: ${document.filename} (ID: ${document.id})');
      final startTime = DateTime.now();

      widget.store.runInTransaction(TxMode.write, () {
        debugPrint('Starting transaction...');
        
        // Query sections
        debugPrint('Querying sections...');
        final queryStart = DateTime.now();
        final query = widget.sectionBox
            .query(DocumentSection_.document.equals(document.id))
            .build();
            
        final sectionCount = query.count();
        final queryDuration = DateTime.now().difference(queryStart);
        debugPrint('Found $sectionCount sections to delete (query took ${queryDuration.inMilliseconds}ms)');

        // Get IDs
        debugPrint('Getting section IDs...');
        final getIdsStart = DateTime.now();
        final ids = query.findIds();
        final getIdsDuration = DateTime.now().difference(getIdsStart);
        debugPrint('Got ${ids.length} section IDs (took ${getIdsDuration.inMilliseconds}ms)');
        
        query.close();

        // Delete sections using removeMany
        debugPrint('Starting batch section deletion...');
        final deleteStart = DateTime.now();
        final removedCount = widget.sectionBox.removeMany(ids);
        final deleteDuration = DateTime.now().difference(deleteStart);
        debugPrint('Sections deleted: $removedCount (took ${deleteDuration.inMilliseconds}ms)');
        
        // Delete document
        debugPrint('Deleting document...');
        final docDeleteStart = DateTime.now();
        widget.documentBox.remove(document.id);
        final docDeleteDuration = DateTime.now().difference(docDeleteStart);
        debugPrint('Document deleted (took ${docDeleteDuration.inMilliseconds}ms)');
      });
      
      final totalDuration = DateTime.now().difference(startTime);
      debugPrint('Total deletion process took ${totalDuration.inMilliseconds}ms');

      ScaffoldMessenger.of(context).showSnackBar(
        SnackBar(content: Text('Deleted ${document.filename}')),
      );
      
      // Refresh the data
      _loadData();
    } catch (e) {
      debugPrint('Error deleting document: $e');
      ScaffoldMessenger.of(context).showSnackBar(
        SnackBar(content: Text('Error deleting document: $e')),
      );
    }
  }

The above code produces these logs:

flutter: Starting deletion of document: Test.pdf (ID: 23)
flutter: Starting transaction...
flutter: Querying sections...
flutter: Found 22085 sections to delete (query took 16ms)
flutter: Getting section IDs...
flutter: Got 22085 section IDs (took 2ms)
flutter: Starting batch section deletion...
flutter: Sections deleted: 22085 (took 264141ms)
flutter: Deleting document...
flutter: Document deleted (took 0ms)
flutter: Total deletion process took 264192ms

Specifically, this line appears to be the bottleneck:

final removedCount = widget.sectionBox.removeMany(ids);

Could the slowdown be related to the HNSW index maintenance during deletion, as all other operations (querying, getting IDs) are very fast? Is there a known solution for this issue?

Environment:

ObjectBox version: 4.1.0 Flutter: 3.29.0 Platform tested on: Linux (Ubuntu 24.04.1 LTS)

Ohrest88 avatar Feb 20 '25 15:02 Ohrest88

Could the slowdown be related to the HNSW index maintenance during deletion

Yes, I'm pretty sure the HNSW index update is the bottleneck. To my knowledge no efficient algorithm exists for HNSW bulk updates yet.

I've seen delete markers used instead of actually removing and reorganizing the index. But this not not perfect either and may just delay the problem to later stages.

What will (likely) happen in your use case after a bulk delete? Will new documents be added? It may help to understand the dynamics and usage patterns to think about solutions....

greenrobot avatar Feb 21 '25 07:02 greenrobot

What will (likely) happen in your use case after a bulk delete? Will new documents be added? It may help to understand the dynamics and usage patterns to think about solutions....

Yes, after a bulk delete, it's quite possible that new documents will be added, but not necessarily. The application allows users to add and delete documents at will, which leads to extensive changes in the database. Adding a single document can create several DocumentSection entries, as many as 22,000 if the document is big, as in this issue's case.

Also, as the database grows in size (number of total DocumentSection entries), deleting even smaller documents (with less than 5000 DocumentSection entries) becomes slow.

Interestingly, while the insertion of 22,000 DocumentSection entries completes in seconds, the deletion of the same 22000 DocumentSection entries can take as long as four minutes.

Ohrest88 avatar Feb 22 '25 11:02 Ohrest88