tantivy
tantivy copied to clipboard
Panic from indexWriter.commit() call
Describe the bug
- What did you do?
call
indexWriter.commit() - What happened? The following panic occured
Panic from rust code! range end index 4 out of range for slice of length 0
Panic in thread from file .../src/schema/term.rs line 246"
which ultimately resulted in An error occurred in a thread: 'Any { .. }' here
- What was expected? Expect no panic
Which version of tantivy are you using? v0.20.2 and this cherry-picked commit
To Reproduce I don't have a minimal code to produce.
It only happens for certain customers, not all customers.
I'm adding a stack trace to the panic. Will share once I have it
Can you provide a stack trace or ideally something to reproduce?
Unfortunately we do not have a reliable repro, but we've captured this stack trace in production:
Panic from rust code! range end index 4 out of range for slice of length 3
Panic in thread from file cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/schema/term.rs line 246
Panic stack trace
0: search_tantivy::main::{{closure}}
at ./server_shared/rust/search_tantivy/src/lib.rs:40:25
1: <alloc::boxed::Box<F,A> as core::ops::function::Fn<Args>>::call
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/alloc/src/boxed.rs:1999:9
2: std::panicking::rust_panic_with_hook
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:709:13
3: std::panicking::begin_panic_handler::{{closure}}
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:597:13
4: std::sys_common::backtrace::__rust_end_short_backtrace
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/sys_common/backtrace.rs:151:18
5: rust_begin_unwind
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:593:5
6: core::panicking::panic_fmt
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panicking.rs:67:14
7: core::slice::index::slice_end_index_len_fail_rt
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/slice/index.rs:76:5
8: core::slice::index::slice_end_index_len_fail
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/slice/index.rs:68:9
9: <core::ops::range::Range<usize> as core::slice::index::SliceIndex<[T]>>::index
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/slice/index.rs:408:13
10: <core::ops::range::RangeTo<usize> as core::slice::index::SliceIndex<[T]>>::index
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/slice/index.rs:455:9
11: core::slice::index::<impl core::ops::index::Index<I> for [T]>::index
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/slice/index.rs:18:15
12: tantivy::schema::term::Term<B>::field
at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/schema/term.rs:246:41
13: tantivy::postings::postings_writer::make_field_partition::{{closure}}
at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/postings/postings_writer.rs:22:26
14: core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &mut F>::call_once
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/ops/function.rs:305:13
15: core::option::Option<T>::map
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/option.rs:1075:29
16: <core::iter::adapters::map::Map<I,F> as core::iter::traits::iterator::Iterator>::next
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/iter/adapters/map.rs:103:26
17: <core::iter::adapters::enumerate::Enumerate<I> as core::iter::traits::iterator::Iterator>::next
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/iter/adapters/enumerate.rs:47:17
18: tantivy::postings::postings_writer::make_field_partition
at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/postings/postings_writer.rs:27:28
19: tantivy::postings::postings_writer::serialize_postings
at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/postings/postings_writer.rs:60:25
20: tantivy::indexer::segment_writer::remap_and_write
at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/indexer/segment_writer.rs:395:5
21: tantivy::indexer::segment_writer::SegmentWriter::finalize
at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/indexer/segment_writer.rs:141:9
22: tantivy::indexer::index_writer::index_documents
at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/indexer/index_writer.rs:198:38
23: tantivy::indexer::index_writer::IndexWriter::add_indexing_worker::{{closure}}
at /var/h/deploy/airtable/cargo/git/checkouts/tantivy-2d24c0f6ab500020/880f80f/src/indexer/index_writer.rs:427:21
24: std::sys_common::backtrace::__rust_begin_short_backtrace
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/sys_common/backtrace.rs:135:18
25: std::thread::Builder::spawn_unchecked_::{{closure}}::{{closure}}
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/thread/mod.rs:529:17
26: <core::panic::unwind_safe::AssertUnwindSafe<F> as core::ops::function::FnOnce<()>>::call_once
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/panic/unwind_safe.rs:271:9
27: std::panicking::try::do_call
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:500:40
28: std::panicking::try
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panicking.rs:464:19
29: std::panic::catch_unwind
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/panic.rs:142:14
30: std::thread::Builder::spawn_unchecked_::{{closure}}
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/thread/mod.rs:528:30
31: core::ops::function::FnOnce::call_once{{vtable.shim}}
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/core/src/ops/function.rs:250:5
32: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/alloc/src/boxed.rs:1985:9
33: <alloc::boxed::Box<F,A> as core::ops::function::FnOnce<Args>>::call_once
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/alloc/src/boxed.rs:1985:9
34: std::sys::unix::thread::Thread::new::thread_start
at /rustc/8ede3aae28fe6e4d52b38157d7bfe0d3bceef225/library/std/src/sys/unix/thread.rs:108:17
35: start_thread
36: clone
Can you share your schema, tokenizer and hardware info?
Our schema just consists of two string fields and one JSON field. The JSON field is constructed as follows:
let text_field_indexing = TextFieldIndexing::default()
.set_tokenizer(DEFAULT_TOKENIZER_NAME)
.set_index_option(IndexRecordOption::WithFreqsAndPositions);
let text_options = TextOptions::default().set_indexing_options(text_field_indexing);
let cell_values_json_field =
schema_builder.add_json_field(CELL_VALUES_BY_COLUMN_ID_FIELD, text_options);
...
let analyzer = TextAnalyzer::builder(SimpleTokenizer::default())
.filter(LowerCaser)
.filter(AsciiFoldingFilter)
.build();
index.tokenizers().register(DEFAULT_TOKENIZER_NAME, analyzer);
We're running on an x86 EC2 VM. Let me know if there is any specific hardware info that would be useful.
I'm looking for anything that would suggest you run into some border case, e.g. tokens longer than u16::MAX or very large memory memory_budget_in_bytes
Can you share an example document?
Term is not supposed to be smaller than 5 bytes, so either it's passed incorrectly or it's read incorrectly. Can you apply this patch and see if the error occurs in this line?
diff --git a/src/postings/postings_writer.rs b/src/postings/postings_writer.rs
index d3c26be13..9a5456edf 100644
--- a/src/postings/postings_writer.rs
+++ b/src/postings/postings_writer.rs
@@ -181,7 +181,7 @@ impl<Rec: Recorder> SpecializedPostingsWriter<Rec> {
impl<Rec: Recorder> PostingsWriter for SpecializedPostingsWriter<Rec> {
#[inline]
fn subscribe(&mut self, doc: DocId, position: u32, term: &Term, ctx: &mut IndexingContext) {
- debug_assert!(term.serialized_term().len() >= 4);
+ assert!(term.serialized_term().len() >= 4);
self.total_num_tokens += 1;
let (term_index, arena) = (&mut ctx.term_index, &mut ctx.arena);
term_index.mutate_or_create(term.serialized_term(), |opt_recorder: Option<Rec>| {
I'm getting this error as well. In my case, I'm passing a tantivy::Document in between processes, serialized with tantivy_common::BinarySerializable.
It seems to be working in trivial unit tests, but I'm seeing the error above with more complex data. I'm trying to narrow down the cause, but would love any pointers on where to look.
@neilyio Do you have the same stack trace? Can you apply the patch I posted? It would narrow it down if Term is constructed incorrectly or read incorrectly. Neither should not happen and I don't have clear pointers currently.
Can you share your schema? Can you share anything to reproduce?
@PSeitz I have a minimal reproduction for you.
mod tests {
use std::io::Cursor;
use tantivy::{schema::Schema, Document, Index, IndexSettings, IndexSortByField, Order};
use tantivy_common::BinarySerializable;
#[test]
fn test_writer_commit() {
let serialized_schema = r#"
[{"name":"category","type":"text","options":{"indexing":{"record":"position","fieldnorms":true,"tokenizer":"default"},"stored":true,"fast":false}},{"name":"description","type":"text","options":{"indexing":{"record":"position","fieldnorms":true,"tokenizer":"default"},"stored":true,"fast":false}},{"name":"rating","type":"i64","options":{"indexed":true,"fieldnorms":false,"fast":true,"stored":true}},{"name":"in_stock","type":"bool","options":{"indexed":true,"fieldnorms":false,"fast":true,"stored":true}},{"name":"metadata","type":"json_object","options":{"stored":true,"indexing":{"record":"position","fieldnorms":true,"tokenizer":"default"},"fast":false,"expand_dots_enabled":true}},{"name":"id","type":"i64","options":{"indexed":true,"fieldnorms":true,"fast":true,"stored":true}},{"name":"ctid","type":"u64","options":{"indexed":true,"fieldnorms":true,"fast":true,"stored":true}}]
"#;
let schema: Schema = serde_json::from_str(&serialized_schema).unwrap();
let settings = IndexSettings {
sort_by_field: Some(IndexSortByField {
field: "id".into(),
order: Order::Asc,
}),
..Default::default()
};
let temp_dir = tempfile::Builder::new().tempdir().unwrap();
let index = Index::builder()
.schema(schema)
.settings(settings)
.create_in_dir(&temp_dir.path())
.unwrap();
let mut writer = index.writer(500_000_000).unwrap();
// This is a string representation of the document bytes that I am sending through IPC.
let document_bytes: Vec<u8> = serde_json::from_str("[135,5,0,0,0,2,1,0,0,0,0,0,0,0,1,0,0,0,0,152,69,114,103,111,110,111,109,105,99,32,109,101,116,97,108,32,107,101,121,98,111,97,114,100,2,0,0,0,2,4,0,0,0,0,0,0,0,0,0,0,0,0,139,69,108,101,99,116,114,111,110,105,99,115,3,0,0,0,9,1,4,0,0,0,8,123,34,99,111,108,111,114,34,58,34,83,105,108,118,101,114,34,44,34,108,111,99,97,116,105,111,110,34,58,34,85,110,105,116,101,100,32,83,116,97,116,101,115,34,125,5,0,0,0,1,1,0,0,0,0,0,0,0]").unwrap();
let document_from_bytes: Document =
BinarySerializable::deserialize(&mut Cursor::new(document_bytes)).unwrap();
// This is a json representation of the above that I'm including here for readability.
// This was generated with `println!(serde_json::to_string(document_from_bytes).unwrap())`.
let document_json = r#"
{"field_values":[{"field":5,"value":1},{"field":1,"value":"Ergonomic metal keyboard"},{"field":2,"value":4},{"field":0,"value":"Electronics"},{"field":3,"value":true},{"field":4,"value":{"color":"Silver","location":"United States"}},{"field":5,"value":1}]}
"#;
// To prove that the document_json and the document_from_bytes represent the same Document,
// we assert their equality here. This is expected to pass.
assert_eq!(
document_json.trim(),
serde_json::to_string(&document_from_bytes).unwrap().trim()
);
writer.add_document(document_from_bytes).unwrap();
// We expect an error here on commit: ErrorInThread("Any { .. }")
writer.commit().unwrap();
}
}
I'd like to note that my Document here contains a JsonObject value, which has given me some trouble with serialization. That's what pushed me to use BinarySerialize in the first place.
Here's a println!("{document_from_bytes:?}") if it's helpful:
Document {
field_values: [
FieldValue { field: Field(5), value: I64(1) },
FieldValue { field: Field(1), value: Str("Ergonomic metal keyboard") },
FieldValue { field: Field(2), value: I64(4) },
FieldValue { field: Field(0), value: Str("Electronics") },
FieldValue { field: Field(3), value: Bool(true) },
FieldValue { field: Field(4), value: JsonObject({"color": String("Silver"), "location": String("United States")}) },
FieldValue { field: Field(5), value: U64(1) }
]
}
Also, for versions, I have :
tantivy = "0.21.1"
tantivy-common = "0.6.0"
Do you mean this error?
thread 'thrd-tantivy-index2' panicked at columnar/src/columnar/writer/column_writers.rs:192:17:
assertion `left == right` failed: Input type forbidden. This column has been forced to type U64, received I64(1)
left: I64
right: U64
This error:
called `Result::unwrap()` on an `Err` value: ErrorInThread("Any { .. }")
Running the minimal example I posted above consistently produces that.
Can you provide a repo? I get
---- tests::test_writer_commit stdout ----
thread 'thrd-tantivy-index0' panicked at /home/pascal/.cargo/registry/src/index.crates.io-6f17d22bba15001f/tantivy-columnar-0.2.0/src/columnar/writer/column_writers.rs:192:17:
assertion `left == right` failed: Input type forbidden. This column has been forced to type I64, received U64(1)
left: U64
right: I64
note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
thread 'tests::test_writer_commit' panicked at src/main.rs:57:25:
called `Result::unwrap()` on an `Err` value: ErrorInThread("Any { .. }")
failures:
tests::test_writer_commit
@JaydenNavarro-at @PingXia-at Do you also use index sort?
@neilyio I am getting the same error as @PSeitz here.
Your document contains a u64 where it should have been a i64. Commit returns an explicit error that tells you the problem. I do not see any issue.
Yes, you're both right, I'm sorry for the distraction. I had some colleagues test the same code and they're seeing the same error as you. It's an issue with my serialization, and my specific test setup seems to be suppressing all but the An error occurred in a thread: 'Any { .. }' message. Thank you both for investigating.