HPCC-Platform
HPCC-Platform copied to clipboard
HPCC-31353 Report the slowest 5 activies in the roxie complete line
Type of change:
- [ ] This change is a bug fix (non-breaking change which fixes an issue).
- [x] This change is a new feature (non-breaking change which adds functionality).
- [ ] This change improves the code (refactor or other change that does not change the functionality)
- [ ] This change fixes warnings (the fix does not alter the functionality or the generated code)
- [ ] This change is a breaking change (fix or feature that will cause existing behavior to change).
- [ ] This change alters the query API (existing queries will have to be recompiled)
Checklist:
- [x] My code follows the code style of this project.
- [x] My code does not create any new warnings from compiler, build system, or lint.
- [x] The commit message is properly formatted and free of typos.
- [x] The commit message title makes sense in a changelog, by itself.
- [x] The commit is signed.
- [ ] My change requires a change to the documentation.
- [ ] I have updated the documentation accordingly, or...
- [ ] I have created a JIRA ticket to update the documentation.
- [ ] Any new interfaces or exported functions are appropriately commented.
- [x] I have read the CONTRIBUTORS document.
- [x] The change has been fully tested:
- [ ] I have added tests to cover my changes.
- [ ] All new and existing tests passed.
- [ ] I have checked that this change does not introduce memory leaks.
- [ ] I have used Valgrind or similar tools to check for potential issues.
- [ ] I have given due consideration to all of the following potential concerns:
- [ ] Scalability
- [ ] Performance
- [ ] Security
- [ ] Thread-safety
- [ ] Cloud-compatibility
- [ ] Premature optimization
- [ ] Existing deployed queries will not be broken
- [ ] This change fixes the problem, not just the symptom
- [ ] The target branch of this pull request is appropriate for such a change.
- [ ] There are no similar instances of the same problem that should be addressed
- [ ] I have addressed them here
- [ ] I have raised JIRA issues to address them separately
- [ ] This is a user interface / front-end modification
- [ ] I have tested my changes in multiple modern browsers
- [ ] The component(s) render as expected
Smoketest:
- [ ] Send notifications about my Pull Request position in Smoketest queue.
- [ ] Test my draft Pull Request.
Testing:
https://track.hpccsystems.com/browse/HPCC-31353 Jira updated
Pushed for discussion (although I think it could be merged). I don't particularly like the way that an extra parameter is needed to be able to record the ids, but a more general approach would be less efficient and I suspect the stats merging should be re-examined. Questions: Should the number of activities be optional/configurable (probably relatively easy). Should I keep the number of activities? It would save a few compares, but complicate the code.
See jira for sample output.
My main concern is what is the performance impact of gathering this information. mergeStats may be called a lot, especially in a child query scenario. Is the information going to be useful enough, often enough?
Not sure I would recommend allowing configuration of the number, as picking a higher number will negatively impact the performance more
Conclusions from discussion:
- It would be better to only do this if the query was above a certain threshold/SLA
- Even better would be to generate a stats workunit for a query which exceeded the SLA - with a limit of only doing it once every minute/5 minutes.
@richardkchapman I have added some timing tests. It has an impact of ~1ns for every activity that isn't in the top 5 and about 10ns for each activity that is. So a impact of ~50us for a very complex query. That is almost certainly lower than the impact of aggregating the rest of the stats. It is only called when the final results are aggregated, so child queries etc. will not impact it.
After further reflection I think it is worthwhile because many roxie queries are not soapcall bound, and this provides some useful debugging information when there is a problem - full stats are much better when rerunning the query.
Thoughts/opinions? @mckellyln renamed slow to slowest, rebased and squashed.
@mckellyln would this be better to only report if the slowest activity was above a certain threshold (e.g., 10ms), or is it always useful?
@ghalliday yes - I think good idea to skip this if slowest activity was less than some configurable threshold (10 ms default).
Added an extra guard condition, ignoring all activities < 10ms. (Compare ignoring case.)