tantivy icon indicating copy to clipboard operation
tantivy copied to clipboard

Refactor the Collector API to be distribute-friendly.

Open fulmicoton opened this issue 3 years ago • 1 comments

Right now the collector API makes it difficult to work in a distributed environment.

We want to merge the segment fruits together on the different node, ship the merged result to a central node and merge those together.

The first merge is akin to a combiner in the hadoop world. Its outcome needs to be mergeable too.

We can probably fix tantivy by apply the following change to the Collector trait.

     fn merge_fruits(
         &self,
         segment_fruits: Vec<<Self::Child as SegmentCollector>::Fruit>,
-    ) -> crate::Result<Self::Fruit>;
+    ) -> crate::Result<<Self::Child as SegmentCollector>::Fruit>;
+
+    fn transform(&self, child_fruit: <Self::Child as SegmentCollector>::Fruit) -> Self::Fruit;

fulmicoton avatar Feb 18 '22 05:02 fulmicoton

Hi, I looked through the implementation and it seems the crate::Result<Self::Fruit> is equal to Self::Child as SegmentCollector>::Fruit and thus the outcome is already mergeable too? Please let me know if I'm missing something!

Self::Fruit

https://github.com/quickwit-oss/tantivy/blob/fcc7bd7024e960a35a99359f475594197fb68cfc/src/aggregation/collector.rs#L61-L62

Self::Child as SegmentCollector>::Fruit

https://github.com/quickwit-oss/tantivy/blob/fcc7bd7024e960a35a99359f475594197fb68cfc/src/aggregation/collector.rs#L161-L162

k-yomo avatar Jul 31 '22 06:07 k-yomo