alluxio
alluxio copied to clipboard
Add extra monitoring indicators part2
What changes are proposed in this pull request?
Add some monitoring indicators for the master status
Why are the changes needed?
Add some monitoring indicators for the master status, collecting information for further system throttling.
Does this PR introduce any user facing changes?
NA
Automated checks report:
- Commits associated with Github account: PASS
- PR title follows the conventions: FAIL
- The title of the PR does not pass all the checks. Please fix the following issues:
- First word of title ("SKIPCI/WIP:") is not an imperative verb. Please use one of the valid words
- The title of the PR does not pass all the checks. Please fix the following issues:
Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.
Automated checks report:
- Commits associated with Github account: PASS
- PR title follows the conventions: PASS
All checks passed!
@beinan Please tell me if you have any comments.
@tcrain Please also help review.
It looks like the values in OperationSystemGaugeSet.java have the update times set to only once every 10 minutes, maybe this should be changed to a smaller value to make this more useful?
I wonder if you could add a bit more comments to describe how some of these are used? Maybe a brief description at the top of each class describing the overall idea of the metrics that each class tracks.
Also maybe some more comments for what some of the methods are used for, for example it is not clear what this multiple is useful for in the SeverInidicator constructor. Also some of the methods for pit time are not clear what they mean or are used for.
It looks like the values in OperationSystemGaugeSet.java have the update times set to only once every 10 minutes, maybe this should be changed to a smaller value to make this more useful?
That is right, do you have any idea how much extra cost is if using smaller granularity.
I wonder if you could add a bit more comments to describe how some of these are used? Maybe a brief description at the top of each class describing the overall idea of the metrics that each class tracks.
Also maybe some more comments for what some of the methods are used for, for example it is not clear what this multiple is useful for in the SeverInidicator constructor. Also some of the methods for pit time are not clear what they mean or are used for.
Added in the head, please take a look
It looks like the values in OperationSystemGaugeSet.java have the update times set to only once every 10 minutes, maybe this should be changed to a smaller value to make this more useful?
That is right, do you have any idea how much extra cost is if using smaller granularity.
I don't expect these functions to be very costly. And they will only be updated when called, so I would set it to however often you want to check them, even some seconds should be fine, but I guess it could be tested.
It looks like the values in OperationSystemGaugeSet.java have the update times set to only once every 10 minutes, maybe this should be changed to a smaller value to make this more useful?
That is right, do you have any idea how much extra cost is if using smaller granularity.
I don't expect these functions to be very costly. And they will only be updated when called, so I would set it to however often you want to check them, even some seconds should be fine, but I guess it could be tested.
Will add that in following pr
@yyongycy high level comment, could you add some user documentation?
@yyongycy high level comment, could you add some user documentation?
Sure, will add that once related PRs are done.
alluxio-bot, merge this please