flowfuse icon indicating copy to clipboard operation
flowfuse copied to clipboard

Clearer CTA with "High Memory Usage" warnings

Open joepavitt opened this issue 1 year ago • 9 comments

Description

See example UX: https://eu.posthog.com/project/2209/replay/01909b5a-3398-7d2e-ad13-4df51078012c

We warn users if they're exceeding 75% CPU utilization on instances. But don't make it clear as to what they can do to remedy the situation.

The user here clicks "Update" next to the instance size/NR version, and presumably thinks this will resolve the problem - but it updates NR version, not the Instance size.

Ideally, we'd point them towards the "Upgrade Instance Size", or at least point them in that direction in the messaging

Which customers would this be available to

Everyone - CE/Starter/Team/Enterprise

Have you provided an initial effort estimate for this issue?

I have provided an initial effort estimate

### Tasks
- [ ] https://github.com/FlowFuse/flowfuse/issues/4188
- [ ] https://github.com/FlowFuse/flowfuse/issues/4193

joepavitt avatar Jul 10 '24 07:07 joepavitt

cc @gstout52

ZJvandeWeg avatar Jul 10 '24 08:07 ZJvandeWeg

We do have more data available, to maybe include some charting to help give an indication of if it was a point event or a steady growth over time. The nr-launcher should be collecting some history of the memory values.

hardillb avatar Jul 10 '24 12:07 hardillb

is that a history of data @hardillb? My thinking is:

  • Iteration 1 - Just change the message to say "It might be worth upgrading your instance"
  • Iteration 2 - Having better observability on memory usage generally

joepavitt avatar Jul 10 '24 12:07 joepavitt

Each instance has it's own promethus data endpoint we can poll.

hardillb avatar Jul 10 '24 12:07 hardillb

Each instance has it's own promethus data endpoint we can poll.

What extent of data do we get here out of interest?

joepavitt avatar Jul 16 '24 15:07 joepavitt

Looking at the code, the nr-launcher keeps a rolling average of the memory and CPU usage for the last 5min (sampling every 10 seconds, keeping 30 samples), this is what it uses to trigger the audit log entries.

We can get poll the nr-launcher for the last 1000 samples (~2.7 hours) we could increase this is helpful

The samples are as follows:

{
    "cpu": 0.3374200000007477,
    "ps": 83.44921875,
    "ela": 0.010179761270875764,
    "el99": 0.011206655,
    "hs": 1048576,
    "hu": 248272,
    "ts": 1721203420046
  }
  • ps Process total size
  • hs Heap Size
  • hu Heap Used
  • ts timestamp
  • cpu % cpu usage in the last sample
  • ela event loop lag average
  • el99 event loop lag 99th percentile

hardillb avatar Jul 17 '24 08:07 hardillb

Thanks for the details Ben - a "Performance" tab for the Instance, with graphical insight into the performance feels like an obvious win here?

joepavitt avatar Jul 17 '24 09:07 joepavitt

Rough design:

  • will need a resources function adding to the drivers (and wrapper) to call the /flowforge/resources endpoint on the nr-launcher
  • expose the resources data on a instance api endpoint
  • add resources function to the frontend api
  • UI stuff....

hardillb avatar Jul 17 '24 09:07 hardillb

Thanks Ben - I'll open a new issue (as part of the tasklist for this item) and I'll add to the planning board for Nick and I to discuss

joepavitt avatar Jul 17 '24 09:07 joepavitt

Closing as the original work was done and a follow up item raised.

knolleary avatar Oct 25 '24 10:10 knolleary