tinkerpop
tinkerpop copied to clipboard
Avoiding expensive hash computation in filter ranking strategy
Avoiding expensive hash computation for complex steps with many children in filter ranking strategy by using identity hash map instead of linked hash map. With the linked hash map, a hash computation is repeated for the traversal parent with each child when doing the grouping, which can be quite costly when there are several hundreds of children (especially since the hash computation also includes the hash of child steps).
Codecov Report
All modified and coverable lines are covered by tests :white_check_mark:
Project coverage is 75.40%. Comparing base (
d9e34fb
) to head (a12f6c6
). Report is 23 commits behind head on 3.6-dev.
Additional details and impacted files
@@ Coverage Diff @@
## 3.6-dev #2504 +/- ##
=============================================
+ Coverage 75.14% 75.40% +0.25%
- Complexity 12346 12359 +13
=============================================
Files 1058 1033 -25
Lines 63610 59890 -3720
Branches 6962 6972 +10
=============================================
- Hits 47801 45159 -2642
+ Misses 13225 12333 -892
+ Partials 2584 2398 -186
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
Good to have some test cases, especially for traversal where there are steps with nested global and local children
I agree with @vkagamlyk that some additional tests would be beneficial here. It is quite hard to prove that collectStepsOfAssignableClassRecursivelyFromDepthGroupedByParent()
does not effect the order of the resulting map in a meaningful way through reasoning alone. I would also be curious to know how substantial the performance difference can be in a "worst case" scenario with hundreds of child steps.
Thanks for the feedback. I added a few tests and renamed some variables. I also found that the lambda handling was not really working since the step collection was restricted to TraversalParent subclasses. This should now also be fixed and I added some basic test coverage for this.
In our experiments with queries of the form
g.V('vertex')...
coalesce(
and(
values('property').is('val1'),
values('property').is('val2'),
...
values('property').is('val800')
),
sideEffect(properties('property').drop()).
property(set, 'propery', 'val1').
property(set, 'propery', 'val2').
...
property(set, 'propery', 'val800')
)...
we have seen improvement of around 25% including all strategies and executing the queries, i.e,. the FilterRankingStrategy improved significantly (as also other strategies taking some significant amount of time).
I don't completely follow the changes being made here but they seem to be well tested so my VOTE is +1.
Thank you @steigma! VOTE +1
Thanks @steigma, the new tests LGTM. VOTE +1