monorepo icon indicating copy to clipboard operation
monorepo copied to clipboard

ETA calculation update

Open ekbainova opened this issue 1 year ago • 1 comments

Background

We need to significantly improve the accuracy of our estimated times to provide users with a clear countdown. Achieving a 99% accurate estimation, whether the path is slow or fast, within 20-30 seconds is crucial.

Current data of the accuracy of our estimated times [NEED TO ADD]

The new estimation time should be calculated based on:

  • [As is] The current liquidity level of the router on the directional chain.
  • Past 20 transfer statistics (#5966) for actual time durations.
  • Router activity calls, including: a. A list of recently initiated transfers and their associated router addresses. b. The duration of inactivity for the router.

Other ideas:

  1. Transfer size
  2. Transfers initiated in the past 3-4 days on currently active routers only
  3. The available router's liquidity must be adjusted by the volume of transfers currently in processing
  4. Difference between estimated and actual time on this route (past 3-4 days)
  5. Dummy variable - actual errors

Product Spec: https://www.notion.so/connext/1-ETA-f434c48e3054455da73dc3a560875c32?pvs=4#b1f5b5132d2b41c3aa6b54706dda9924

Linked Issues & Documentation

Connected to #5966 and #5736

ekbainova avatar May 13 '24 17:05 ekbainova

Current

This is what our estimates currently include.

1) Query idle router liquidity

Current idle liquidity of the asset across all routers on the destination chain is the liquidity that should be available for use in fast path. Idle liquidity is the amount deposited - amount removed - amount in flight. Notice this already subtracts in flight liquidity (liquidity that is currently unavailable because it was used to boost a previous transfer and the funds have yet to reconcile for the router).

Multiple routers can provide liquidity for a single transfer - this is limited to 3 by the network right now. So the max available router liquidity at any time is the idle liquidity of the top 3 routers.

We query our DB for this idle liquidity which refreshes its view every 15 seconds. This can be improved by reading from another indexing layer like a subgraph or reading from a node directly via RPC call. These options will introduce other tradeoffs like cost, uptime, and speed of the estimate being returned.

2) Factor in router availability

Routers can provide idle liquidity but experience downtime. When routers are down, we have an incomplete view of the useable idle liquidity that can be used for fast path. We currently use the following logic as a proxy for router availability, overriding a "fast path" estimate if a majority of the recent transfers have actually been slow.

  • If estimated latency is "fast path", check the status of the last N=20 transfers within the last 3 hours
  • If >50% of these transfers were completedSlow, then display a "slow path" estimate instead

3) Display time ranges

Along with the fast/slow determination, we also provide a median and a range for fast/slow paths for the last N=20 transfers of each type.

  • If estimated latency is "fast path", then display the median times for the last N fast transfers rounded up to minute precision
  • If estimated latency is "slow path", then display the median times for the last N slow transfers rounded up to hour and minute precision

Improvements to consider

  • Exclude idle liquidity of routers that haven't boosted any transfers in the last X days from 1)
    • For Chimera, there should be router telemetry endpoints where down routers can more easily be identified
  • Specify a tighter range of historical transfers to look at for 2) and 3). e.g. exclude transfers that are 20% more/less than the amount being estimated for (right now we look at last N transfers of any amount)
  • Express estimates with confidence intervals based on historical data. e.g. there is a 95% chance this transfer will take M minutes

just-a-node avatar Jun 13 '24 21:06 just-a-node