cli icon indicating copy to clipboard operation
cli copied to clipboard

docker stats over SSH loses stream data

Open trajano opened this issue 4 years ago • 3 comments

Description

Steps to reproduce the issue:

  1. DOCKER_HOST=ssh:/// docker stats

Describe the results you received:

A few services come out as -- on a loaded machine/network

Describe the results you expected:

Same output as docker stats when SSHing to the machine which has no --

Additional information you deem important (e.g. issue happens only occasionally):

Maybe moot when https://github.com/docker/cli/issues/2034 is implemented but still needed if --no-stream is enabled. One fix I can think of is to provide a stats-collection-interval setting to allow maybe 2 seconds to pass in order to get data for the stream.

Output of docker version:

Client:
 Cloud integration: 1.0.17
 Version:           20.10.7
 API version:       1.41
 Go version:        go1.16.4
 Git commit:        f0df350
 Built:             Wed Jun  2 12:00:56 2021
 OS/Arch:           windows/amd64
 Context:           default
 Experimental:      true

Server: Docker Engine - Community
 Engine:
  Version:          20.10.7
  API version:      1.41 (minimum version 1.12)
  Go version:       go1.13.15
  Git commit:       b0f5bc3
  Built:            Wed Jun  2 11:54:58 2021
  OS/Arch:          linux/amd64
  Experimental:     false
 containerd:
  Version:          1.4.6
  GitCommit:        d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc:
  Version:          1.0.0-rc95
  GitCommit:        b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 docker-init:
  Version:          0.19.0
  GitCommit:        de40ad0

Output of docker info:

Client:
 Context:    default
 Debug Mode: false
 Plugins:
  buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
  compose: Docker Compose (Docker Inc., v2.0.0-beta.6)
  scan: Docker Scan (Docker Inc., v0.8.0)

Server:
 Containers: 59
  Running: 59
  Paused: 0
  Stopped: 0
 Images: 36
 Server Version: 20.10.7
 Storage Driver: overlay2
  Backing Filesystem: extfs
  Supports d_type: true
  Native Overlay Diff: true
  userxattr: false
 Logging Driver: json-file
 Cgroup Driver: cgroupfs
 Cgroup Version: 1
 Plugins:
  Volume: local
  Network: bridge host ipvlan macvlan null overlay
  Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
 Swarm: active
  NodeID: 8t72n1z0pvuzae8exm0t64t9y
  Is Manager: true
  ClusterID: lnr6ht5kmrls1lsm6vufgzhur
  Managers: 1
  Nodes: 1
  Default Address Pool: 10.0.0.0/8
  SubnetSize: 24
  Data Path Port: 4789
  Orchestration:
   Task History Retention Limit: 5
  Raft:
   Snapshot Interval: 10000
   Number of Old Snapshots to Retain: 0
   Heartbeat Tick: 1
   Election Tick: 10
  Dispatcher:
   Heartbeat Period: 5 seconds
  CA Configuration:
   Expiry Duration: 3 months
   Force Rotate: 0
  Autolock Managers: false
  Root Rotation In Progress: false
  Node Address: 192.168.65.3
  Manager Addresses:
   192.168.65.3:2377
 Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
 Default Runtime: runc
 Init Binary: docker-init
 containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
 runc version: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
 init version: de40ad0
 Security Options:
  seccomp
   Profile: default
 Kernel Version: 5.4.72-microsoft-standard-WSL2
 Operating System: Docker Desktop
 OSType: linux
 Architecture: x86_64
 CPUs: 4
 Total Memory: 11.7GiB
 Name: docker-desktop
 ID: YUI3:VSRR:52EO:4Y4M:SDMO:MDZM:PSLE:HWFA:KI63:DPGV:AIFI:RNND
 Docker Root Dir: /var/lib/docker
 Debug Mode: false
 Registry: https://index.docker.io/v1/
 Labels:
 Experimental: false
 Insecure Registries:
  127.0.0.0/8
 Live Restore Enabled: false

WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support

Additional environment details (AWS, VirtualBox, physical, etc.):

AWS terraformed with https://github.com/trajano/terraform-docker-swarm-aws

trajano avatar Jul 30 '21 23:07 trajano

/cc @AkihiroSuda

thaJeztah avatar Aug 03 '21 14:08 thaJeztah

I see this also happens to me when there are particularly many containers. I am running a script that runs docker stats --no-stream every 10 seconds, parsing the data and writing it to a database. Our script initially opened pipe to docker stats and kept on receiving the stream, parsing every line of output and writing them to a database. But with over 150 containers running, the CPU that containerd consumed was very high than my script/database and keeping the stream open was taking my servers load average easily to 4 on an 8-core CPU with over 150 containers running. We have plenty of memory but still, the performance of docker stats command was very poor. docker stats --no-stream every 5 seconds is also a big overhead. 10 seconds interval seems quite good.

Our script experienced failures when there are so many containers, because docker stats sends -- instead of data. We tackled it by ignoring that entry for that moment. If this gets resolved and docker stats can provide a stream with less overhead, this might be a good upgrade many devs are looking for.

Consider keeping the legacy docker stats if curses are being introduced in #2034.

And the interval in stats is quite appreciated because opening a process every time is also an overhead.

sibidharan avatar Jul 04 '22 22:07 sibidharan

@sibidharan you should use curl to the docker endpoint to get the data you need. https://docs.docker.com/engine/api/v1.41/#operation/ContainerStats at least that's what I'd likely be doing once I decide to write a local monitor that would track the last few data into a local in-memory database that destroys stuff after a minute since I don't need year old performance metrics.

trajano avatar Jul 04 '22 23:07 trajano

@sibidharan you should use curl to the docker endpoint to get the data you need. https://docs.docker.com/engine/api/v1.41/#operation/ContainerStats at least that's what I'd likely be doing once I decide to write a local monitor that would track the last few data into a local in-memory database that destroys stuff after a minute since I don't need year old performance metrics.

With this, I can only get stats for one container, right? Is there a way to get the stats of all containers at once? If not using API to make many queries will be a big overhead.

Sorry for a late response.

sibidharan avatar Nov 08 '22 21:11 sibidharan

The overhead will be the same regardless because the CLI just uses the API in the end.

trajano avatar Nov 08 '22 22:11 trajano