tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

document with facet can not be deleted

Open rustmailer opened this issue 3 months ago • 8 comments

I encountered a strange issue. The document has three fields: fields a and b are of type u64, and field c is a facet field. I found that records with a facet value added cannot be deleted, whereas records without the facet can be deleted normally using a term query. Why is this happening?

I am using the latest version.

rustmailer avatar Nov 17 '25 17:11 rustmailer

To clarify, it seems that after modifying a document, it cannot be deleted. The modification is done by first calling delete_query, then add_document, and finally commit.

rustmailer avatar Nov 17 '25 18:11 rustmailer

Can you provide some code to reproduce?

PSeitz avatar Nov 17 '25 19:11 PSeitz

#[tokio::test]
async fn test2() {
    use tantivy::schema::{FAST, INDEXED, STORED, STRING};
    let mut builder = Schema::builder();
    let a = builder.add_u64_field("a", INDEXED | FAST);
    let b = builder.add_text_field("b", STRING | STORED | FAST);

    let schema = builder.build();
    let index = Index::create_in_ram(schema);
    let mut index_writer: IndexWriter = index.writer(50_000_000).unwrap();

    let delete_term1 = Term::from_field_u64(a, 1u64);
    let delete_term2 = Term::from_field_u64(a, 2u64);
    let delete_term3 = Term::from_field_u64(a, 3u64);

    let operations = vec![
        //UserOperation::Delete(delete_term1),
        UserOperation::Add(doc!(
        a => 1u64,
        b => "test1"
        )),
        //UserOperation::Delete(delete_term2),
        UserOperation::Add(doc!(
        a => 2u64,
        b => "test1"
        )),
        //UserOperation::Delete(delete_term3),
        UserOperation::Add(doc!(
        a => 3u64,
        b => "test1"
        )),
    ];

    index_writer.run(operations).unwrap();
    index_writer.commit().unwrap();

    let reader = index.reader().unwrap();

    let searcher = reader.searcher();

    let tq = TermQuery::new(Term::from_field_u64(a, 3), IndexRecordOption::Basic);

    let docs = searcher.search(&tq, &TopDocs::with_limit(1)).unwrap();

    if let Some((_, doc_address)) = docs.first() {
        let old_doc: TantivyDocument = searcher.doc_async(*doc_address).await.unwrap();

        let mut new_doc = TantivyDocument::new();
        for (field, value) in old_doc.field_values() {
            if field == a {
                new_doc.add_field_value(a, value);
            }
        }
        new_doc.add_text(b, "test2");

        let delete_term = Term::from_field_u64(a, 3);
        index_writer.delete_term(delete_term);
        index_writer.commit().unwrap();
        index_writer.add_document(new_doc).unwrap();
        index_writer.commit().unwrap();
    }

    reader.reload().unwrap();
    let searcher = reader.searcher();
    let docs = searcher.search(&tq, &TopDocs::with_limit(1)).unwrap();

    if let Some((_, doc_address)) = docs.first() {
        let doc: TantivyDocument = searcher.doc_async(*doc_address).await.unwrap();
        for (field, value) in doc.field_values() {
            if field == b {
                let value = value.as_str();
                println!("{:#?}", value);
            }
        }
    } else {
        println!("not found")
    }

    let delete_term = Term::from_field_u64(a, 3);
    index_writer.delete_term(delete_term);
    index_writer.commit().unwrap();

    reader.reload().unwrap();
    let searcher = reader.searcher();
    let docs = searcher.search(&tq, &TopDocs::with_limit(1)).unwrap();

    if let Some((_, doc_address)) = docs.first() {
        let doc: TantivyDocument = searcher.doc_async(*doc_address).await.unwrap();
        for (field, value) in doc.field_values() {
            if field == b {
                let value = value.as_str();
                println!("{:#?}", value);
            }
        }
    } else {
        println!("not found")
    }
}

A similar code flow to this one currently produces results that don’t match expectations. However, I discovered that when I changed this line — let a = builder.add_u64_field("a", INDEXED | FAST); — and added STORED to the field, things started working correctly. Why is that?

rustmailer avatar Nov 18 '25 03:11 rustmailer

I don't see any facets in your example. Can you provide a minimal example with an assertion?

PSeitz avatar Nov 19 '25 08:11 PSeitz

@PSeitz Sorry, I sent a text example because I later realized they behave the same. This example is about these two fields:

let a = builder.add_u64_field("a", INDEXED | FAST);
let b = builder.add_text_field("b", STRING | STORED | FAST);

If the a field is not set as STORED, then when I first delete a record using a as the delete_term, then modify the b field of that record, and try to delete it again, it doesn’t take effect. But if a is set as STORED, then everything works fine. I want to understand why this happens.

rustmailer avatar Nov 19 '25 11:11 rustmailer

Can you provide a minimal example with an assertion?

PSeitz avatar Nov 19 '25 12:11 PSeitz

#[tokio::test]
async fn test2() {
    use tantivy::schema::{FAST, INDEXED, STORED, STRING};
    let mut builder = Schema::builder();
    let a = builder.add_u64_field("a", INDEXED | FAST);
    let b = builder.add_text_field("b", STRING | STORED | FAST);

    let schema = builder.build();
    let index = Index::create_in_ram(schema);
    let mut index_writer: IndexWriter = index.writer(50_000_000).unwrap();

    let delete_term1 = Term::from_field_u64(a, 1u64);
    let delete_term2 = Term::from_field_u64(a, 2u64);
    let delete_term3 = Term::from_field_u64(a, 3u64);

    let operations = vec![
        UserOperation::Delete(delete_term1),
        UserOperation::Add(doc!(
        a => 1u64,
        b => "v1"
        )),
        UserOperation::Delete(delete_term2),
        UserOperation::Add(doc!(
        a => 2u64,
        b => "v1"
        )),
        UserOperation::Delete(delete_term3),
        UserOperation::Add(doc!(
        a => 3u64,
        b => "v1"
        )),
    ];

    index_writer.run(operations).unwrap();
    index_writer.commit().unwrap();

    let reader = index.reader().unwrap();
    let searcher = reader.searcher();
    let tq = TermQuery::new(Term::from_field_u64(a, 3), IndexRecordOption::Basic);
    let docs = searcher.search(&tq, &TopDocs::with_limit(1)).unwrap();
    assert!(docs.first().is_some());
    if let Some((_, doc_address)) = docs.first() {
        let old_doc: TantivyDocument = searcher.doc_async(*doc_address).await.unwrap();

        let mut new_doc = TantivyDocument::new();
        for (field, value) in old_doc.field_values() {
            if field == a {
                new_doc.add_field_value(a, value);
            }
            if field == b {
                assert_eq!(Some("v1"), value.as_str())
            }
        }
        new_doc.add_text(b, "v2");

        let delete_term = Term::from_field_u64(a, 3);
        index_writer.delete_term(delete_term);
        index_writer.add_document(new_doc).unwrap();
        index_writer.commit().unwrap();
    }

    reader.reload().unwrap();
    let searcher = reader.searcher();
    let docs = searcher.search(&tq, &TopDocs::with_limit(1)).unwrap();
    assert!(docs.first().is_some());
    if let Some((_, doc_address)) = docs.first() {
        let doc: TantivyDocument = searcher.doc_async(*doc_address).await.unwrap();
        for (field, value) in doc.field_values() {
            if field == b {
                assert_eq!(Some("v2"), value.as_str())
            }
        }
    }

    let delete_term = Term::from_field_u64(a, 3);
    index_writer.delete_term(delete_term);
    index_writer.commit().unwrap();

    reader.reload().unwrap();
    let searcher = reader.searcher();
    let docs = searcher.search(&tq, &TopDocs::with_limit(1)).unwrap();
    assert!(docs.first().is_none());
    if let Some((_, doc_address)) = docs.first() {
        let doc: TantivyDocument = searcher.doc_async(*doc_address).await.unwrap();
        for (field, value) in doc.field_values() {
            if field == b {
                let value = value.as_str();
                println!("{:#?}", value);
            }
        }
    }
}

rustmailer avatar Nov 19 '25 14:11 rustmailer

It's unclear what the expectation is and which assert fails. Can you add a minimal example, where a document that should be deleted is still there? You can replace doc_async with the simpler doc.

PSeitz avatar Nov 19 '25 15:11 PSeitz

@rustmailer this is most likely not a bug, and has nothing to do with facet.

I suspect you are reusing a searcher or forgot to reload the reader. You can see a searcher as a handle over a snapshot view of your index.

As long as you use it, you will not see any change in your index.

To make sure you get an up to date searcher, you need to call reader.reload()?; and acquire a new searcher.

fulmicoton avatar Dec 15 '25 09:12 fulmicoton