Calculate metrics on lazily loaded UTXO set
PR #1103 allows to load utxo subset when needed instead of loading it entirely during startup. This meas that in-memory UTXO set contains information processed since the last service restart.
Metrics collected so far:
- balance
- unique addresses (see: {repo_root}/apps/omg/lib/omg/state/measurement_calculation.ex)
Where we should move metrics calculation :question:
PS: Maybe this really belongs to the informational watcher for product. But Child Chain really needs this visibility.
- There's two set of metrics we're interested in:
- Product metrics related to UTXOs (how utilized are we, what are the amounts going in and out)
- balance and unique addresses really belong to informational watcher (SQL)
- Technical engineering perspective (separate issue TBD)
- Round trip measurements, size of the state set and the memory implications etc.
I'm tackling this ticket from the following angles:
:authority_balance- Stays in
childchainsince it's primarily the operator's concern to topup the balance. - Currently reported only when there's a block submission. This means the monitors can't differentiate between no data vs broken reporting.
- Change to report periodically, e.g. every 5 minutes.
- Update: This one is not urgent since the
:authority_balanceshould only decrease on a block submission anyway
- Stays in
:balanceper token- Stays in
childchainso the operator can be alarmed on the network's insolvency. - Add to
watcherandwatcher_infoso the integrator/user can be alarmed on the network's insolvency. - How to aggregate the not-loaded utxos without significant performance/resource impact?
- Maybe async task spawned at app start up that fetches and aggregates the not-loaded utxos in batches.
- Stays in
:unique_users- Move to
watcher_info, good to have info for business insights but not network's healthiness. - Since it's only needed in
watcher_info, we can populate and aggregate the info from the informational database.
- Move to
[more like for discussion, probably not too much suitable for the task]
If there are multiple metrics, we might consider to invest some time to see the possibility of stream on the DB changes.
One design I used to see and really liked was: Service -> DB -> db change stream -> probably some computing job change format or such -> logs, business DB....etc
This gives really clear boundary and the performance impact is sort of limited. However, what I see was also fully cloud support feature to enable that 😅 the dynamoDB I used to use has the streaming feature which is amazing to use.....I think if we do this way we would need to made our own stream mechanism on our DBs (rocksDB, postgres <-- I guess there is potential postgres has similar feature but not sure)
[more like for discussion, probably not too much suitable for the task]
If there are multiple metrics, we might consider to invest some time to see the possibility of stream on the DB changes.
One design I used to see and really liked was: Service -> DB -> db change stream -> probably some computing job change format or such -> logs, business DB....etc
This gives really clear boundary and the performance impact is sort of limited. However, what I see was also fully cloud support feature to enable that 😅 the dynamoDB I used to use has the streaming feature which is amazing to use.....I think if we do this way we would need to made our own stream mechanism on our DBs (rocksDB, postgres <-- I guess there is potential postgres has similar feature but not sure)
Moving/noting this down into https://github.com/omgnetwork/private-issues/issues/66
After more thoughts, fixing the balance is not easy so:
:authority_balance-> fix later, it reports accurately but just not reporting frequent enough:unique_users-> fix later, doesn't impact network health:balanceper token -> working on this, needed to monitor network's solvency