alluxio icon indicating copy to clipboard operation
alluxio copied to clipboard

Support monitor helm chart

Open bzheng888 opened this issue 2 years ago • 10 comments

What changes are proposed in this pull request?

Add a monitor helm chart to monitor alluxio cluster in kubernetes, view the README.md for detail information.

Why are the changes needed?

Easy to deploy a monitor system (grafana + prometheus) in kubernetes.

Does this PR introduce any user facing changes?

We can view cluster information by using grafana page just like this: image image image

bzheng888 avatar Jun 30 '22 15:06 bzheng888

image image @ssz1997 @ZhuTopher The error is not caused by this pr,PTAL, thx!

bzheng888 avatar Jul 04 '22 04:07 bzheng888

@bzheng888 It's fixed now. Please pull the latest master. Let me know when it's ready for review

ssz1997 avatar Jul 05 '22 20:07 ssz1997

@ssz1997 Seems no errors, PTAL!

bzheng888 avatar Jul 06 '22 07:07 bzheng888

@ssz1997 PTAL!

bzheng888 avatar Jul 08 '22 07:07 bzheng888

Failing unit test is unrelated to PR changes:

Error: 4.672 [ERROR] Errors: 
Error: 4.672 [ERROR] alluxio.server.ft.journal.raft.EmbeddedJournalIntegrationTestTransferLeadership.resetPriorities
Error: 4.673 [ERROR]   Run 1: EmbeddedJournalIntegrationTestTransferLeadership.resetPriorities:207->transferAndWait:250->EmbeddedJournalIntegrationTestBase.waitForQuorumPropertySize:58 » Timeout
Error: 4.674 [ERROR]   Run 2: EmbeddedJournalIntegrationTestTransferLeadership.resetPriorities:207->transferAndWait:250->EmbeddedJournalIntegrationTestBase.waitForQuorumPropertySize:58 » Timeout

ZhuTopher avatar Jul 15 '22 00:07 ZhuTopher

@ZhuTopher @jiacheliu3 Take a look if you have time?

ssz1997 avatar Jul 15 '22 17:07 ssz1997

Based on the community sync discussion, @bzheng888 please put the big file to dashboard? or other place that available to end-users?

LuQQiu avatar Aug 16 '22 00:08 LuQQiu

@jiacheliu3 @ssz1997 @LuQQiu Where should I upload the dashboard json file after I remove it?

bzheng888 avatar Sep 02 '22 06:09 bzheng888

@beinan this is the PR for monitoring the Alluxio system in Kubernetes

LuQQiu avatar Sep 02 '22 19:09 LuQQiu

@bzheng888 https://grafana.com/grafana/dashboards/ as @maobaolong mentioned, this website may be a good idea

LuQQiu avatar Sep 02 '22 19:09 LuQQiu

@ZhuTopher @ssz1997 @jiacheliu3 @maobaolong @LuQQiu Update this pr, PTAL, the dashboard can be download from https://grafana.com/grafana/dashboards/17785-alluxio-prometheus-grafana-monitor-v1/

bzheng888 avatar Jan 04 '23 01:01 bzheng888

@LuQQiu assigned myself for a final pass and merge the PR

LuQQiu avatar Jan 04 '23 18:01 LuQQiu

@jiacheliu3 Does this PR good to be merged? saw you haven't approved the PR

LuQQiu avatar Jan 05 '23 02:01 LuQQiu

@jiacheliu3 PTAL

bzheng888 avatar Jan 06 '23 02:01 bzheng888

alluxio-bot, merge this please

jiacheliu3 avatar Jan 07 '23 08:01 jiacheliu3