dashboard
dashboard copied to clipboard
Reports of browsers running out of memory when tailing log files in UI
Internal reference: SURE-5383 Reported in Rancher 2.6.8.
When troubleshooting an issue involving websockets, I did get a report that was adjacent to it. When using the Vue UI to tail log files, the browser stops responding.
browser memory is consumed until the tab is killed. If you leave one open long enough, it will eventually die. If you pull up pod logs for a busy pod though you can kill it in a few minutes (rancher kubectl UI). We have tried, edge, chrome and brave and they all exhibit the symptom.
Browser tab started at 650mb before opening logs, high cpu usage while streaming them, and memory growing rapidly. Browser tab crashed at about 2.5gb memory footprint.
We'll need to see if we can repro this to narrow down what's going on. We may need busy log activity to fully reproduce.
Workaround: Restart browser tab every few minutes, or sometimes before one minute.
I work for one of your customers that's reported this issue. I can replicate this issue within a minute or less when viewing logs for a very busy pod (nginx ingress for example). But just so that it's noted, I can login and never look at a log, and eventually the browser tab will crash from high memory usage. So it's not specific to log viewing.
Based on additional feedback with @Sean-McQ observing behaviour, other pages such as v1 Project Monitoring, Deployments and Pods are seeing this memory usage too.
Determining if we have to spawn separate tickets per page or be more generic here. 2.6.9 does offer some improvement but we have more digging to do.
Some connection to https://github.com/rancher/dashboard/issues/7247
✅ PASSED
Reproduction Environment
Component | Version / Type |
---|---|
Rancher version | 2.7.0 |
Installation option | docker |
Cert Details | docker install with --acme-domain |
Docker version | 20.10.7, build f0df350 |
Helm version | v2.16.8-rancher2 |
Downstream cluster type | not applicable |
Downstream K8s version | not applicable |
Authentication providers enabled | local |
Logged in user role | admin, standard user |
Browser type | google chrome |
Browser version | 109.0.5414.87 (Official Build) (x86_64) |
🚨 Additional Reproduction Setup Details: Click to Expand
Docker Rancher install setup with Terraform: https://github.com/brudnak/linode-docker-cattle
Reproduction steps
- Setup Rancher
- Starting from the default Rancher homepage
/dashboard/home
- Click
hamburger menu
>>>local
>>>Kubectl Shell
- Copy the following deployment yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
name: test-logs
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
spec:
affinity: {}
containers:
- args:
- 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep .001; done'
command:
- /bin/sh
- -c
image: busybox
imagePullPolicy: Always
name: fast
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- args:
- 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done'
command:
- /bin/sh
- -c
image: busybox
imagePullPolicy: Always
name: slower
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- args:
- 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 10; done'
command:
- /bin/sh
- -c
image: busybox
imagePullPolicy: Always
name: sloooow
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
- Paste this into a file in the
Kubectl Shell
and run it:
vim deploy.yml
# paste above yaml into file and exit vim
kubectl apply -f deploy.yml
- Once deployed navigate to
local
>>>Workload
>>>Deployments
>>>test-logs
- For the pod running in the
test-logs
deployment, click theellipsis (three dots)
>>> clickView Logs
- Once you see the logs populating
- right click chrome/screen
- click
inspect
>>>ellipsis (three dots)
in chrome >>>More tools
>>>Performance monitor
- Let this run for ~20 minutes
Additional Info
RESULTS
✅ Expected
For the Rancher UI to continue running without any issues
❌ Actual
The UI became unusable after ~20 minutes.
Metric | value |
---|---|
JS heap size | 1692 MB |
DOM Nodes | 720,256 |
Validation Environment
Component | Version / Type |
---|---|
Rancher version | v2.7-bd652cb9126f80238e5bfc063a551d6de03fc4b7-head |
Rancher commit link | https://github.com/rancher/rancher/commit/bd652cb9126f80238e5bfc063a551d6de03fc4b7 |
Installation option | docker |
Cert Details | docker install with --acme-domain |
Docker version | 20.10.7, build f0df350 |
Helm version | v2.16.8-rancher2 |
Downstream cluster type | not applicable |
Downstream K8s version | not applicable |
Authentication providers enabled | local |
Logged in user role | admin, standard user |
Browser type | google chrome |
Browser version | 109.0.5414.87 (Official Build) (x86_64) |
🚨 Additional Reproduction Setup Details: Click to Expand
Docker Rancher install setup with Terraform: https://github.com/brudnak/linode-docker-cattle
Validation steps
- Setup Rancher
- Starting from the default Rancher homepage
/dashboard/home
- Click
hamburger menu
>>>local
>>>Kubectl Shell
- Copy the following deployment yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
name: test-logs
namespace: default
spec:
progressDeadlineSeconds: 600
replicas: 1
revisionHistoryLimit: 10
selector:
matchLabels:
workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
labels:
workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
spec:
affinity: {}
containers:
- args:
- 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep .001; done'
command:
- /bin/sh
- -c
image: busybox
imagePullPolicy: Always
name: fast
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- args:
- 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done'
command:
- /bin/sh
- -c
image: busybox
imagePullPolicy: Always
name: slower
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
- args:
- 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 10; done'
command:
- /bin/sh
- -c
image: busybox
imagePullPolicy: Always
name: sloooow
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
- Paste this into a file in the
Kubectl Shell
and run it:
vim deploy.yml
# paste above yaml into file and exit vim
kubectl apply -f deploy.yml
- Once deployed navigate to
local
>>>Workload
>>>Deployments
>>>test-logs
- For the pod running in the
test-logs
deployment, click theellipsis (three dots)
>>> clickView Logs
- Once you see the logs populating
- right click chrome/screen
- click
inspect
>>>ellipsis (three dots)
in chrome >>>More tools
>>>Performance monitor
- Let this run for ~20 minutes
Additional Info
RESULTS
✅ Expected
For the Rancher UI to continue running without any issues
✅ Actual
No issues with Rancher after ~20 mins and drastically lower metrics
Metric | value | Improvement % |
---|---|---|
JS heap size | 146 MB | 91.3% |
DOM Nodes | 7,434 | 98.9% |