venus icon indicating copy to clipboard operation
venus copied to clipboard

[venus]全组件metrics指标监控

Open hunjixin opened this issue 3 years ago • 4 comments

Discussed in https://github.com/filecoin-project/venus/discussions/4950

Originally posted by hunjixin June 21, 2022

Checklist

  • [X] This is not a new feature or an enhancement to the Filecoin protocol. If it is, please open an FIP issue.
  • [X] This is not a new feature request. If it is, please file a feature request instead.
  • [X] This is not brainstorming ideas. If you have an idea you'd like to discuss, please open a new discussion on the venus forum and select the category as Ideas.
  • [X] I have a specific, actionable, and well motivated improvement to propose.

Venus component

  • [X] venus daemon - [chain service] chain sync
  • [X] venus auth - [chain service] authentication
  • [X] venus messager - [chain service] message management (mpool)
  • [X] venus gateway - [chain service] gateway
  • [X] venus miner - [chain service] mining and block production
  • [X] venus sealer/worker - sealing
  • [X] venus sealer - proving (WindowPoSt)
  • [X] venus market - storage deal
  • [X] venus market - retrieval deal
  • [X] venus market - data transfer
  • [ ] venus light-weight client
  • [ ] venus JSON-RPC API
  • [ ] Other

Improvement Suggestion

venus组件重要指标监控支持,方便运维和用户直接方便进行指标监控,发生异常是能够及时干预修复。

hunjixin avatar Jun 21 '22 06:06 hunjixin

venus

  1. 新tipset处理时间
  2. mpool 消息数量
  3. 高度, 区块数量,消息数量,重量
  4. 接受到新区块和预期的消息之间的时间差

venus-message

  1. 最近一段时间每个地址的资产,nonce,消息状态,多少待打包,多少失败,多少失败待处理,
  2. 每轮次 选择消息数量,推送数量,待打包数量,失败消息数量
  3. 多少消息堵塞超过3分钟, 超过5分钟
  4. 从venus接收到的区块事件 稳定之后的时间,及时发现同步问题

venus-gateway

  1. 链接到gateway的钱包数量,地址数量,ip位置
  2. 链接到gateway的miner的数量,地址数量,ip位置
  3. 通过gateway的签名数量

venus-market

  1. 一段时间内接受到的的存储/检索订单数量,成功率

venus-miner

  1. 一段时间内的出块权数量
  2. 计算证明耗时
  3. 签名耗时
  4. 拿base耗时
  5. 前置运算耗时

venus-auth/venus-wallet

处于安全考虑待定

hunjixin avatar Jun 21 '22 06:06 hunjixin

这几个怎么样?

Venus

  • Block validation time
  • Memory / CPU usage
  • Number of goroutines
  • IPLD block read latency
  • Bandwidth usage

Cluster

  • windowPost计算时间
  • winningPost计算时间

venus-market

  • deal传输速度,数量,时间,状态
  • 检索传输速度,数量,时间,状态

Fatman13 avatar Jul 08 '22 09:07 Fatman13

https://github.com/filecoin-project/venus/issues/4960 https://github.com/filecoin-project/venus/issues/5054

diwufeiwen avatar Jul 14 '22 01:07 diwufeiwen

原来lotus有的Validated X messages (X per second)在几秒内验证多少条消息。

  • [ ] https://github.com/filecoin-project/lotus/pull/9052

Fatman13 avatar Jul 22 '22 05:07 Fatman13

单独组件的做完了,后续需要推动和farcast的工作, 一方面是他们继承我们的指标,另一方面是我们需要增加一些具备venus特色的指标。

hunjixin avatar Sep 04 '22 07:09 hunjixin

venus-messager metrics 指标

地址

# 地址余额
WalletBalance    = stats.Int64("wallet_balance", "Wallet balance", stats.UnitDimensionless)
# 地址在数据库中的nonce值
WalletDBNonce    = stats.Int64("wallet_db_nonce", "Wallet nonce in db", stats.UnitDimensionless)
# 地址链上nonce值
WalletChainNonce = stats.Int64("wallet_chain_nonce", "Wallet nonce on the chain", stats.UnitDimensionless)

消息数量

# unfill消息数量,可以根据地址分组
NumOfUnFillMsg = stats.Int64("num_of_unfill_msg", "The number of unFill msg", stats.UnitDimensionless)
# fill消息数量,可以根据地址分组
NumOfFillMsg   = stats.Int64("num_of_fill_msg", "The number of fill Msg", stats.UnitDimensionless)
# failed消息数量
NumOfFailedMsg = stats.Int64("num_of_failed_msg", "The number of failed msg", stats.UnitDimensionless)

# fill消息三分未上链的数量
NumOfMsgBlockedThreeMinutes = stats.Int64("blocked_three_minutes_msgs", "Number of messages blocked for more than 3 minutes", stats.UnitDimensionless)
# fill消息五分组未上链的数量
NumOfMsgBlockedFiveMinutes  = stats.Int64("blocked_five_minutes_msgs", "Number of messages blocked for more than 5 minutes", stats.UnitDimensionless)

单次选择消息情况

# 选择的消息数量
SelectedMsgNumOfLastRound = stats.Int64("selected_msg_num", "Number of selected messages in the last round", stats.UnitDimensionless)
# 还未上链的fill消息
ToPushMsgNumOfLastRound   = stats.Int64("topush_msg_num", "Number of to-push messages in the last round", stats.UnitDimensionless)
# 过期的消息数量
ExpiredMsgNumOfLastRound  = stats.Int64("expired_msg_num", "Number of expired messages in the last round", stats.UnitDimensionless)
# 错误的消息数量
ErrMsgNumOfLastRound      = stats.Int64("err_msg_num", "Number of err messages in the last round", stats.UnitDimensionless)

head

# 链head稳定的花费时间
ChainHeadStableDelay    = stats.Int64("chain_head_stable_s", "Delay of chain head stabilization", stats.UnitSeconds)

venus-gateway

钱包

# 钱包注册
WalletRegister   = stats.Int64("wallet_register", "Wallet register", stats.UnitDimensionless)
# 钱包注销
WalletUnregister = stats.Int64("wallet_unregister", "Wallet unregister", stats.UnitDimensionless)
# 钱包数量
WalletNum        = stats.Int64("wallet_num", "Wallet count", stats.UnitDimensionless)
# 钱包包含的地址数量
WalletAddressNum = stats.Int64("wallet_address_num", "Address owned by wallet", stats.UnitDimensionless)
# 钱包来源
WalletSource     = stats.Int64("wallet_source", "Wallet IP", stats.UnitDimensionless)
# 钱包新增地址
WalletAddAddr    = stats.Int64("wallet_add_addr", "Wallet add a new address", stats.UnitDimensionless)
# 钱包移除地址
WalletRemoveAddr = stats.Int64("wallet_remove_addr", "Wallet remove a new address", stats.UnitDimensionless)
# 钱包的连接数量
WalletConnNum    = stats.Int64("wallet_conn_num", "Wallet connection count", stats.UnitDimensionless)

矿工

# 矿工注册
MinerRegister   = stats.Int64("miner_register", "Miner register", stats.UnitDimensionless)
# 矿工注销
MinerUnregister = stats.Int64("miner_unregister", "Miner unregister", stats.UnitDimensionless)
# 矿工数量
MinerNum        = stats.Int64("miner_num", "Wallet count", stats.UnitDimensionless)
# 矿工来源
MinerSource     = stats.Int64("wallet_source", "Miner IP", stats.UnitDimensionless)
# 矿工的连接数量
MinerConnNum    = stats.Int64("miner_conn_num", "Miner connection count", stats.UnitDimensionless)

接口调用

# 签名耗时(毫秒)
WalletSign         = stats.Float64("wallet_sign", "Call WalletSign spent time", stats.UnitMilliseconds)
# 列出钱包地址耗时(毫秒)
WalletList         = stats.Float64("wallet_list", "Call WalletList spent time", stats.UnitMilliseconds)
# 计算 winnerpost 耗时(毫秒)
ComputeProof       = stats.Float64("compute_proof", "Call ComputeProof spent time", stats.UnitMilliseconds)
# 调用 IsUnsealed 耗时(毫秒)
IsUnsealed         = stats.Float64("is_unsealed", "Call IsUnsealed spent time", stats.UnitMilliseconds)
# 调用 SectorsUnsealPiece(毫秒)
SectorsUnsealPiece = stats.Float64("sectors_unseal_piece", "Call SectorsUnsealPiece spent time", stats.UnitMilliseconds)

simlecode avatar Sep 16 '22 08:09 simlecode