dashboard icon indicating copy to clipboard operation
dashboard copied to clipboard

Reports of browsers running out of memory when tailing log files in UI

Open gaktive opened this issue 2 years ago • 1 comments

Internal reference: SURE-5383 Reported in Rancher 2.6.8.

When troubleshooting an issue involving websockets, I did get a report that was adjacent to it. When using the Vue UI to tail log files, the browser stops responding.

browser memory is consumed until the tab is killed. If you leave one open long enough, it will eventually die. If you pull up pod logs for a busy pod though you can kill it in a few minutes (rancher kubectl UI). We have tried, edge, chrome and brave and they all exhibit the symptom.

Browser tab started at 650mb before opening logs, high cpu usage while streaming them, and memory growing rapidly. Browser tab crashed at about 2.5gb memory footprint.

We'll need to see if we can repro this to narrow down what's going on. We may need busy log activity to fully reproduce.

Workaround: Restart browser tab every few minutes, or sometimes before one minute.

gaktive avatar Oct 11 '22 15:10 gaktive

I work for one of your customers that's reported this issue. I can replicate this issue within a minute or less when viewing logs for a very busy pod (nginx ingress for example). But just so that it's noted, I can login and never look at a log, and eventually the browser tab will crash from high memory usage. So it's not specific to log viewing.

seanwcom avatar Oct 12 '22 04:10 seanwcom

Based on additional feedback with @Sean-McQ observing behaviour, other pages such as v1 Project Monitoring, Deployments and Pods are seeing this memory usage too.

Determining if we have to spawn separate tickets per page or be more generic here. 2.6.9 does offer some improvement but we have more digging to do.

gaktive avatar Oct 19 '22 22:10 gaktive

Some connection to https://github.com/rancher/dashboard/issues/7247

gaktive avatar Oct 20 '22 16:10 gaktive

✅ PASSED

Reproduction Environment

Component Version / Type
Rancher version 2.7.0
Installation option docker
Cert Details docker install with --acme-domain
Docker version 20.10.7, build f0df350
Helm version v2.16.8-rancher2
Downstream cluster type not applicable
Downstream K8s version not applicable
Authentication providers enabled local
Logged in user role admin, standard user
Browser type google chrome
Browser version 109.0.5414.87 (Official Build) (x86_64)
🚨 Additional Reproduction Setup Details: Click to Expand

Docker Rancher install setup with Terraform: https://github.com/brudnak/linode-docker-cattle

Reproduction steps

  1. Setup Rancher
  2. Starting from the default Rancher homepage /dashboard/home
  3. Click hamburger menu >>> local >>> Kubectl Shell
  4. Copy the following deployment yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
  name: test-logs
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
    spec:
      affinity: {}
      containers:
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep .001; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: fast
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: slower
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 10; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: sloooow
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
  1. Paste this into a file in the Kubectl Shell and run it:
vim deploy.yml

# paste above yaml into file and exit vim

kubectl apply -f deploy.yml
  1. Once deployed navigate to local >>> Workload >>> Deployments >>> test-logs
  2. For the pod running in the test-logs deployment, click the ellipsis (three dots) >>> click View Logs
  3. Once you see the logs populating
    • right click chrome/screen
    • click inspect >>> ellipsis (three dots) in chrome >>> More tools >>> Performance monitor
  4. Let this run for ~20 minutes

Additional Info

RESULTS

✅ Expected

For the Rancher UI to continue running without any issues

❌ Actual

The UI became unusable after ~20 minutes.

Metric value
JS heap size 1692 MB
DOM Nodes 720,256

Validation Environment

Component Version / Type
Rancher version v2.7-bd652cb9126f80238e5bfc063a551d6de03fc4b7-head
Rancher commit link https://github.com/rancher/rancher/commit/bd652cb9126f80238e5bfc063a551d6de03fc4b7
Installation option docker
Cert Details docker install with --acme-domain
Docker version 20.10.7, build f0df350
Helm version v2.16.8-rancher2
Downstream cluster type not applicable
Downstream K8s version not applicable
Authentication providers enabled local
Logged in user role admin, standard user
Browser type google chrome
Browser version 109.0.5414.87 (Official Build) (x86_64)
🚨 Additional Reproduction Setup Details: Click to Expand

Docker Rancher install setup with Terraform: https://github.com/brudnak/linode-docker-cattle

Validation steps

  1. Setup Rancher
  2. Starting from the default Rancher homepage /dashboard/home
  3. Click hamburger menu >>> local >>> Kubectl Shell
  4. Copy the following deployment yaml:
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
  name: test-logs
  namespace: default
spec:
  progressDeadlineSeconds: 600
  replicas: 1
  revisionHistoryLimit: 10
  selector:
    matchLabels:
      workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      labels:
        workload.user.cattle.io/workloadselector: apps.deployment-default-test-logs
    spec:
      affinity: {}
      containers:
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep .001; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: fast
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 1; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: slower
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      - args:
        - 'i=0; while true; do echo "$i: $(date)"; i=$((i+1)); sleep 10; done'
        command:
        - /bin/sh
        - -c
        image: busybox
        imagePullPolicy: Always
        name: sloooow
        resources: {}
        terminationMessagePath: /dev/termination-log
        terminationMessagePolicy: File
      dnsPolicy: ClusterFirst
      restartPolicy: Always
      schedulerName: default-scheduler
      securityContext: {}
      terminationGracePeriodSeconds: 30
  1. Paste this into a file in the Kubectl Shell and run it:
vim deploy.yml

# paste above yaml into file and exit vim

kubectl apply -f deploy.yml
  1. Once deployed navigate to local >>> Workload >>> Deployments >>> test-logs
  2. For the pod running in the test-logs deployment, click the ellipsis (three dots) >>> click View Logs
  3. Once you see the logs populating
    • right click chrome/screen
    • click inspect >>> ellipsis (three dots) in chrome >>> More tools >>> Performance monitor
  4. Let this run for ~20 minutes

Additional Info

RESULTS

✅ Expected

For the Rancher UI to continue running without any issues

✅ Actual

No issues with Rancher after ~20 mins and drastically lower metrics

Metric value Improvement %
JS heap size 146 MB 91.3%
DOM Nodes 7,434 98.9%

brudnak avatar Jan 25 '23 23:01 brudnak