tinkerpop icon indicating copy to clipboard operation
tinkerpop copied to clipboard

Avoiding expensive hash computation in filter ranking strategy

Open steigma opened this issue 1 year ago • 4 comments

Avoiding expensive hash computation for complex steps with many children in filter ranking strategy by using identity hash map instead of linked hash map. With the linked hash map, a hash computation is repeated for the traversal parent with each child when doing the grouping, which can be quite costly when there are several hundreds of children (especially since the hash computation also includes the hash of child steps).

steigma avatar Feb 23 '24 11:02 steigma

Codecov Report

All modified and coverable lines are covered by tests :white_check_mark:

Project coverage is 75.40%. Comparing base (d9e34fb) to head (a12f6c6). Report is 23 commits behind head on 3.6-dev.

Additional details and impacted files
@@              Coverage Diff              @@
##             3.6-dev    #2504      +/-   ##
=============================================
+ Coverage      75.14%   75.40%   +0.25%     
- Complexity     12346    12359      +13     
=============================================
  Files           1058     1033      -25     
  Lines          63610    59890    -3720     
  Branches        6962     6972      +10     
=============================================
- Hits           47801    45159    -2642     
+ Misses         13225    12333     -892     
+ Partials        2584     2398     -186     

:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.

codecov-commenter avatar Feb 23 '24 16:02 codecov-commenter

Good to have some test cases, especially for traversal where there are steps with nested global and local children

vkagamlyk avatar Feb 23 '24 17:02 vkagamlyk

I agree with @vkagamlyk that some additional tests would be beneficial here. It is quite hard to prove that collectStepsOfAssignableClassRecursivelyFromDepthGroupedByParent() does not effect the order of the resulting map in a meaningful way through reasoning alone. I would also be curious to know how substantial the performance difference can be in a "worst case" scenario with hundreds of child steps.

Cole-Greer avatar Feb 23 '24 19:02 Cole-Greer

Thanks for the feedback. I added a few tests and renamed some variables. I also found that the lambda handling was not really working since the step collection was restricted to TraversalParent subclasses. This should now also be fixed and I added some basic test coverage for this.

In our experiments with queries of the form

g.V('vertex')...
coalesce(
    and(
      values('property').is('val1'),
      values('property').is('val2'),
      ...
      values('property').is('val800')
   ),
   sideEffect(properties('property').drop()).
   property(set, 'propery', 'val1').
   property(set, 'propery', 'val2').
   ...
   property(set, 'propery', 'val800')
)...

we have seen improvement of around 25% including all strategies and executing the queries, i.e,. the FilterRankingStrategy improved significantly (as also other strategies taking some significant amount of time).

steigma avatar Feb 27 '24 14:02 steigma

I don't completely follow the changes being made here but they seem to be well tested so my VOTE is +1.

kenhuuu avatar Mar 05 '24 00:03 kenhuuu

Thank you @steigma! VOTE +1

vkagamlyk avatar Mar 05 '24 19:03 vkagamlyk

Thanks @steigma, the new tests LGTM. VOTE +1

Cole-Greer avatar Mar 05 '24 19:03 Cole-Greer