Gaffer icon indicating copy to clipboard operation
Gaffer copied to clipboard

New Accumulo key-package optimised for `GetRDDOfAllElements`

Open gaffer01 opened this issue 6 years ago • 2 comments

The GetRDDOfAllElements operation applies a filter to avoid every Edge being returned twice. This means that twice as many Edges as necessary are being read. This could be avoided by storing the "forward" version of an Edge of group G with one column family (e.g. "G-F") and the "backwards" version with a different column family (e.g. "G-B"). As long as there is one locality group for each of these then this will reduce the amount of data read by a GetRDDOfAllElements operation by a factor of 2.

This could also improve the efficiency of some other queries and possibly slightly slow down others.

This needs a new key-package.

gaffer01 avatar Oct 18 '17 13:10 gaffer01

This new key package should be significantly better than either of the current two key packages. It should be the default key package in version 2.0.

gaffer01 avatar Jun 01 '18 12:06 gaffer01

Following a discussion between @gaffer01 and @d21211122 it has been decided that this will not be included in Gaffer 2.0. While this will not be implemented against the Accumulo store, it may still be possible to support full scan with a future TBD cloud native store for Gaffer v2 that implements GetAllElements without returning edges both ways round.

n3101 avatar Jan 26 '22 16:01 n3101