alluxio icon indicating copy to clipboard operation
alluxio copied to clipboard

Add extra monitoring indicators part2

Open yyongycy opened this issue 2 years ago • 10 comments

What changes are proposed in this pull request?

Add some monitoring indicators for the master status

Why are the changes needed?

Add some monitoring indicators for the master status, collecting information for further system throttling.

Does this PR introduce any user facing changes?

NA

yyongycy avatar Sep 09 '22 07:09 yyongycy

Automated checks report:

  • Commits associated with Github account: PASS
  • PR title follows the conventions: FAIL
    • The title of the PR does not pass all the checks. Please fix the following issues:
      • First word of title ("SKIPCI/WIP:") is not an imperative verb. Please use one of the valid words

Some checks failed. Please fix the reported issues and reply 'alluxio-bot, check this please' to re-run checks.

alluxio-bot avatar Sep 09 '22 07:09 alluxio-bot

Automated checks report:

  • Commits associated with Github account: PASS
  • PR title follows the conventions: PASS

All checks passed!

alluxio-bot avatar Sep 14 '22 02:09 alluxio-bot

@beinan Please tell me if you have any comments.

yyongycy avatar Sep 15 '22 01:09 yyongycy

@tcrain Please also help review.

yyongycy avatar Sep 15 '22 06:09 yyongycy

It looks like the values in OperationSystemGaugeSet.java have the update times set to only once every 10 minutes, maybe this should be changed to a smaller value to make this more useful?

tcrain avatar Sep 20 '22 01:09 tcrain

I wonder if you could add a bit more comments to describe how some of these are used? Maybe a brief description at the top of each class describing the overall idea of the metrics that each class tracks.

Also maybe some more comments for what some of the methods are used for, for example it is not clear what this multiple is useful for in the SeverInidicator constructor. Also some of the methods for pit time are not clear what they mean or are used for.

tcrain avatar Sep 20 '22 01:09 tcrain

It looks like the values in OperationSystemGaugeSet.java have the update times set to only once every 10 minutes, maybe this should be changed to a smaller value to make this more useful?

That is right, do you have any idea how much extra cost is if using smaller granularity.

yyongycy avatar Sep 20 '22 08:09 yyongycy

I wonder if you could add a bit more comments to describe how some of these are used? Maybe a brief description at the top of each class describing the overall idea of the metrics that each class tracks.

Also maybe some more comments for what some of the methods are used for, for example it is not clear what this multiple is useful for in the SeverInidicator constructor. Also some of the methods for pit time are not clear what they mean or are used for.

Added in the head, please take a look

yyongycy avatar Sep 20 '22 09:09 yyongycy

It looks like the values in OperationSystemGaugeSet.java have the update times set to only once every 10 minutes, maybe this should be changed to a smaller value to make this more useful?

That is right, do you have any idea how much extra cost is if using smaller granularity.

I don't expect these functions to be very costly. And they will only be updated when called, so I would set it to however often you want to check them, even some seconds should be fine, but I guess it could be tested.

tcrain avatar Sep 22 '22 02:09 tcrain

It looks like the values in OperationSystemGaugeSet.java have the update times set to only once every 10 minutes, maybe this should be changed to a smaller value to make this more useful?

That is right, do you have any idea how much extra cost is if using smaller granularity.

I don't expect these functions to be very costly. And they will only be updated when called, so I would set it to however often you want to check them, even some seconds should be fine, but I guess it could be tested.

Will add that in following pr

yyongycy avatar Sep 22 '22 02:09 yyongycy

@yyongycy high level comment, could you add some user documentation?

yuzhu avatar Sep 22 '22 23:09 yuzhu

@yyongycy high level comment, could you add some user documentation?

Sure, will add that once related PRs are done.

yyongycy avatar Sep 23 '22 01:09 yyongycy

alluxio-bot, merge this please

yyongycy avatar Sep 23 '22 05:09 yyongycy