monorepo icon indicating copy to clipboard operation
monorepo copied to clipboard

As a team, we should monitor latency for settlement time (TTV) and refund time (TTR) for fast/slow paths

Open rhlsthrm opened this issue 3 years ago • 6 comments

Problem As we roll out Amarok to mainnet and scale, we should be reevaluating the success of the product, and further monitoring breaking issues, bugs, and anomalies

According to our public messaging the below should be true. This has not always been the reality yet in testing Fast path should be <60s Slow path should be <10min

Ideas to solve this Use a view table or stored procedure to calculate the average time in a given window between XCalled and Executed for both fast path and slow path.

Acceptance Criteria [ ] Able to monitor avg time for a transaction to execute [ ] Able to view this information by fast vs. slow path [ ] Able to view this information over time / during specific time windows

Other

Example: response_1657630841211.txt transfer_id=eq.0x93b71578ac1d387326f34ea03a8443eeb572d2516c0f5d31be906f8b8dec5108

For calculation: TTV = (execute_timestamp - xcall_timestamp) TTR = (reconcile_timestamp - execute_timestamp)

Metrics:

  • Avg
  • Max/Min
  • [optional] median
  • [optional] stdev

Ability to slice metrics by:

  • Date range
  • Asset - "destination_local_asset": "0x3ffc03f05d1869f493c7dbf913e636c6280e0ff9",
  • Path - "status": "CompletedFast",
  • Domain - "destination_domain": "1111",

All metrics https://docs.google.com/spreadsheets/d/1WUv0ye5Ev0fO3w2ejVYRRyCHhTXi8W5OG6vMCJwTkhk/edit#gid=1298285392

rhlsthrm avatar May 18 '22 14:05 rhlsthrm

This data already exists. We might want to add an API and/or a view to monitor this

P3 - Nice

alexwhte avatar Jun 01 '22 16:06 alexwhte

User is working on this here https://discord.com/channels/454734546869551114/985959064125198376 Our scope is supporting feedback and enabling fast vs. slow data fetching https://datastudio.google.com/reporting/8b568e44-640e-457b-861a-0d88c03008f2

alexwhte avatar Jun 13 '22 17:06 alexwhte

We need an SDK method to provide access to this. @just-a-node can you define a model for how the function signature should look and what the data returned by the backend should be? I'm thinking it should be grouped by domain and time range (past hour, day, week, etc). Then we can create some view functions to expose.

rhlsthrm avatar Jun 28 '22 15:06 rhlsthrm

Makes sense to group by domainId and time range. This will be similar to how Pool metrics like volume and APY will be broken down. Even further, we might want to differentiate between mean/median and include stddev too.

Proposing something like this, which can allow for extending with other metrics:

interface ITimeToValue {
  meanTimeToValue: number,
  medianTimeToValue: number,
  stddev: number,
}

interface IXCallSettlementStats {
  day: {
    ITimeToValue,
  },
  week: {
    ITimeToValue,
  },
  month: {
    ITimeToValue,
  },
  total: {
    ITimeToValue,
  }
}

async getXCallSettlementTimes(domainId: string, slowLiquidity: boolean): Promise<IXCallSettlementStats>

just-a-node avatar Jul 06 '22 02:07 just-a-node

Note confirm in standup today

alexwhte avatar Jul 12 '22 13:07 alexwhte

NOTE: Nomad is building some new data agent to allow users to monitor latency. This would be good diff in the future so that we can see if latency is from our agents vs. from Nomad

alexwhte avatar Jul 20 '22 01:07 alexwhte