tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

Refactor Fastfield codecs similar to a columnar storage

Open PSeitz opened this issue 3 years ago • 3 comments

Currently the fast field codecs have their own interface which is normalized to u64 values. The conversion to u64 happens in the tantivy layer via FastValue. This separation causes a lot of duplication, so we have different traits and implementors, which are all similar, e.g.

struct DynamicFastFieldReader<Item: FastValue> (dispatch to different codecs)
FastFieldReader<Item: FastValue>: Clone (tantivy trait to access fast fields)
FastFieldReaderCodecWrapper<Item: FastValue, CodecReader> (used in tantivy crate to access codec values as FastValue)

FastFieldCodecReader (used in fastfield_codecs crate to access values as u64)
FastFieldDataAccess (used in serialization)

The idea would be to consolidate those (or as much as possible) and let the fastfield_codecs crate act like a columnar storage. The used trait would similar to:

pub trait Col<T: FastValue> {
   fn len(&self); 
   fn get(&self, idx);
   fn get_range(&self, range: Range<u64>) -> Box<dyn Iterator<Item=T>>; 
   fn get_iter(&self) -> Box<dyn Iterator<Item=T>>; 
}

PSeitz avatar Aug 23 '22 10:08 PSeitz

I would add min / max too, and remove the trait boundary to FastValue.

fulmicoton avatar Aug 23 '22 14:08 fulmicoton

len() will probably be an issue with regards to backwards compatibility.

If we don't use FastValue (or FastValueU128 for u128), we can't convert the type to u64, on which the codecs operate on.

PSeitz avatar Aug 24 '22 07:08 PSeitz

len() will probably be an issue with regards to backwards compatibility.

I think we will once again hurt backward compat.

If we don't use FastValue (or FastValueU128 for u128), we can't convert the type to u64, on which the codecs operate on.

We have different types of column. That trait is usable for anything and should not have a trait boundary to FastValue.

If we don't use FastValue (or FastValueU128 for u128), we can't convert the type to u64, on which the codecs operate on.

I don't think not putting the trait boundary here prevents us from doing anything.

fulmicoton avatar Aug 24 '22 09:08 fulmicoton