How to efficiently retrieve value from text fast field
Hi!
I've been trying out tantivy, and I'm stumbling upon some issues with fast fields. I might be misunderstanding how to actually use the fast fields.
I have a set of documents I am searching using tantivy, and for each document I have an id field that is STORED | FAST.
I want to get a list of documents, and their IDs, as fast as possible - currently my bottleneck is looking up the documents. I'm trying to make use of the fastfields, but the documentation in the fastfield module is not super clear on how to actually use the fast field.
My index is currently about 300k documents, and I'm spending 50-60% of my time doing lookups.
This is my current implentation for looking up in the docstore:
for &(score, doc_address) in found_docs {
let doc: TantivyDocument = searcher.doc(doc_address)?;
// the EF ID is in the `id` field, but the tantivy API ain't pretty.
let id_value = doc
.get_first(id_field)
.ok_or_else(|| SearchError::InternalError("Unable to get expected field".into()))?
.as_str()
.ok_or_else(|| SearchError::InternalError("Unable to get expected field".into()))?
.to_string();
entry_id_score_entry.insert(id_value, (score, None));
}
Things I've tried
- I've tried an alternate implementation using the fast field API directly, but I'm not sure if I'm using it right. It is also 10x slower
for &(score, doc_address) in found_docs {
let segment = doc_address.segment_ord;
let segment_reader = searcher.segment_reader(segment);
let id_fast_field = segment_reader.fast_fields().str("id").unwrap().unwrap();
let mut id_value = String::with_capacity(36); // pre-allocate capacity to fit
// The majority of time is taken inside `ord_to_str`
id_fast_field
.ord_to_str(doc_address.segment_ord.to_u64(), &mut id_value)
.unwrap();
entry_id_score_entry.insert(id_value, (score, None));
}
- I've looked at
TopDocs::order_by_fast_fieldbut I still want the ordering from the tantivy scores - I just want the values of the fast fields. - I've been looking at
FastFieldReaderas described in #2604, but I don't seem to be able to figure out how to create one.
One first thing you could do is move
let id_fast_field = segment_reader.fast_fields().str("id").unwrap().unwrap();
outside of your loop.
Yeah sorry for not doing that in the example - it doesn't actually make much of a difference. I've tried doing that but:
- It requires that I precompute some stuff or that I only have one segment, as the segment reader is based on the returned doc
- When I make sure I only have one segment, and I extract both the segment reader and the fast field lookup outside of the inner loop, it does not make any noticeable performance gain.
Almost all of the work is done inside StrColumnn::ord_to_str which takes up 40% of the total search time when trying to use fast fields.
fast field text fields are currently always zstd compressed. Accessing is pretty expensive, since we always decompress zstd blocks (This will be configurable in the next release with the columnar-zstd-compression feature flag)
We also decompress blocks in the doc store, with the main difference that the blocks are cached.
The fastest way to access multiple terms in the fast field is sorted_ords_to_term_cb, which avoids decompressing the same block multiple times.
In your example I guess that the hits are in the same doc store block, which leads to the 10x difference.
Ah, I see - the caching would explain it!
An implementation that uses sorted_ords_to_term_cb does seem to be about 40% faster than repeatedly doing lookups, which is good to know!
We actually tried disabling the unreleased columnnar-zstd-compression and trying out from the master branch, but it also did not seem to provide a significant speedup. Perhaps I'll give it another shot when it's fully released, and get back to this issue :)
Glad to see I'm not using fast fields completely wrong though.
So, coming back to this after trying tantivy 0.25 without zstd-columnar compression, using a fastfield is still significantly (2x~) slower than just doing lookups directly.
I have timed 3 different approaches
Approach 1 - doc lookups
let mut entry_id_score_entry_1: HashMap<String, (Score, Option<&Entry>)> = HashMap::new();
for &(score, doc_address) in found_docs {
let doc: TantivyDocument = searcher.doc(doc_address)?;
let id_value = doc
.get_first(id_field)
.ok_or_else(|| SearchError::InternalError("Unable to get expected field".into()))?
.as_str()
.ok_or_else(|| SearchError::InternalError("Unable to get expected field".into()))?
.to_string();
entry_id_score_entry_1.insert(id_value, (score, None));
}
Approach 2 - one fast field variation
let id_field_name = schema.schema.get_field_name(id_field);
let mut entry_id_score_entry_2: HashMap<String, (Score, Option<&Entry>)> = HashMap::new();
let mut seg_id_readers: HashMap<SegmentOrdinal, StrColumn> = HashMap::new();
fn get_str_field(column: &StrColumn, doc_id: DocId) -> Result<String, SearchError> {
let ords: Vec<_> = column.ords().values_for_doc(doc_id).collect();
if ords.is_empty() {
return Err(SearchError::InternalError(
"Unable to get ord for id field".into(),
));
}
if ords.len() > 1 {
return Err(SearchError::InternalError(
"Too many ord results for id field".into(),
));
}
// I think our IDs are 36 characters long. We should check this.
let mut id_value = String::with_capacity(36);
column.ord_to_str(ords[0], &mut id_value).map_err(|_| {
SearchError::InternalError("Unable to get string from StrColumn".into())
})?;
Ok(id_value)
}
for &(score, doc_address) in found_docs {
// cache doc_id_reader by segment_ord,
let segment_id = doc_address.segment_ord;
let id_column_option = seg_id_readers.get(&segment_id);
let id_value = match id_column_option {
Some(id_column) => get_str_field(id_column, doc_address.doc_id)?,
None => {
let segment_reader = searcher.segment_reader(segment_id);
let id_column = segment_reader
.fast_fields()
.str(id_field_name)
.map_err(|_| SearchError::InternalError("Unable to get expected field".into()))?
.ok_or_else(|| {
SearchError::InternalError("Unable to get expected fast field".into())
})?;
let id_value = get_str_field(&id_column, doc_address.doc_id)?;
seg_id_readers.insert(segment_id, id_column);
id_value
}
};
entry_id_score_entry_2.insert(id_value, (score, None));
}
Approach 3 - another fast field variation
let mut entry_id_score_entry_3: HashMap<String, (Score, Option<&Entry>)> = HashMap::new();
for &(score, doc_address) in found_docs {
let segment = doc_address.segment_ord;
let segment_reader = searcher.segment_reader(segment);
let id_fast_field = segment_reader.fast_fields().str("id").unwrap().unwrap();
let mut id_value = String::with_capacity(36); // pre-allocate capacity to fit
let ord = id_fast_field
.ords()
.values_for_doc(doc_address.doc_id)
.next()
.unwrap();
id_fast_field.ord_to_str(ord, &mut id_value).unwrap();
entry_id_score_entry_3.insert(id_value.clone(), (score, None));
}
The timings look like this
// With default compression (i.e. column compression turned on)
Round: 'Approach without fastfields': Time elapsed = 11.486ms
Round: 'Fast field approach 1': Time elapsed = 125.246666ms
Round: 'Fast 3ield approach 2': Time elapsed = 160.802916ms
// Without feature "columnar-zstd-compression"
Round: 'Approach without fastfields': Time elapsed = 13.442209ms
Round: 'Fast field approach 1': Time elapsed = 25.757417ms
Round: 'Fast field approach 2': Time elapsed = 35.337208ms
We can see that the fast fields are significantly faster without the compression, but they're still 2x slower than just doing lookups.
Fast field approach 3
Now, I can get it to work faster if I use sorted_ords_to_term_cb, but the actual code is.. not pretty
let mut entry_id_score_entry_3: HashMap<String, (Score, Option<&Entry>)> = HashMap::new();
// We might only have one segment, since we statically build stuff. Still, let's do it the
// "right" way first and see if we can optimize it away later.
let mut seg_id_readers: HashMap<SegmentOrdinal, StrColumn> = HashMap::new();
fn get_ord_for_doc_id(column: &StrColumn, doc_id: DocId) -> Result<u64, SearchError> {
let ords: Vec<_> = column.ords().values_for_doc(doc_id).collect();
if ords.is_empty() {
return Err(SearchError::InternalError(
"Unable to get ord for id field".into(),
));
}
if ords.len() > 1 {
return Err(SearchError::InternalError(
"Too many ord results for id field".into(),
));
}
Ok(ords[0])
}
let mut intermediate_results: HashMap<SegmentOrdinal, Vec<(u64, Score)>> = HashMap::new();
for &(score, doc_address) in found_docs {
// cache doc_id_reader by segment_ord,
let segment_id = doc_address.segment_ord;
let id_column_option = seg_id_readers.get(&segment_id);
let ord = match id_column_option {
Some(id_column) => get_ord_for_doc_id(&id_column, doc_address.doc_id)?,
None => {
let segment_reader = searcher.segment_reader(segment_id);
let id_column = segment_reader
.fast_fields()
.str(id_field_name)
.map_err(|_| SearchError::InternalError("Unable to get expected field".into()))?
.ok_or_else(|| {
SearchError::InternalError("Unable to get expected fast field".into())
})?;
let ord = get_ord_for_doc_id(&id_column, doc_address.doc_id)?;
seg_id_readers.insert(segment_id, id_column);
ord
}
};
intermediate_results
.entry(segment_id)
.or_insert_with(Vec::new)
.push((ord, score));
}
for (segment_id, mut results) in intermediate_results {
results.sort_by_key(|x| x.0);
let id_column = seg_id_readers
.get(&segment_id)
.expect("Couldn't get id column");
let dict = id_column.dictionary();
let mut ids: Vec<String> = Vec::with_capacity(results.len());
dict.sorted_ords_to_term_cb(results.iter().map(|x| x.0), |term| {
let possible_id = String::from_utf8(term.to_vec());
match possible_id {
Ok(id) => ids.push(id),
Err(_) => {
return Err(std::io::Error::other(
"Unable to convert string to utf-8".to_string(),
));
}
}
Ok(())
})
.map_err(|_| SearchError::InternalError("Error running sorted_ords_to_term_cb".into()))?;
for (id, (_, score)) in zip(ids, results) {
entry_id_score_entry_3.insert(id, (score, None));
}
}
This is faster, than just the doc-lookups, when compression is disabled, but not by a huge margin.
// With default features
Round: 'Fast field approach 3': Time elapsed = 26.318167ms
// Without zstd-compression
Round: 'Fast field approach 3': Time elapsed = 9.038167ms
Am I still using fast fields wrong, or are they just worse than doing lookups for string-based fields, even without compression?
Can you add timings for Approach 1? A flamegraph for each would be helpful to spot anything unusual
Sorry, the timings were probably easy to miss.
The timings for approach 1 are as follows: Round: 'Approach without fastfields': Time elapsed = 11.486ms
I have some old flamegraphs lying around for approach 2 with compression enabled
and for approach 2 without compression
In the end we've currently bitten the bullet and implemented our own in-memory cache on top of tantivy as we couldn't get the fastfields to work with strings.