cli
cli copied to clipboard
docker stats over SSH loses stream data
Description
Steps to reproduce the issue:
- DOCKER_HOST=ssh:/// docker stats
Describe the results you received:
A few services come out as -- on a loaded machine/network
Describe the results you expected:
Same output as docker stats when SSHing to the machine which has no --
Additional information you deem important (e.g. issue happens only occasionally):
Maybe moot when https://github.com/docker/cli/issues/2034 is implemented but still needed if --no-stream is enabled. One fix I can think of is to provide a stats-collection-interval setting to allow maybe 2 seconds to pass in order to get data for the stream.
Output of docker version:
Client:
Cloud integration: 1.0.17
Version: 20.10.7
API version: 1.41
Go version: go1.16.4
Git commit: f0df350
Built: Wed Jun 2 12:00:56 2021
OS/Arch: windows/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.7
API version: 1.41 (minimum version 1.12)
Go version: go1.13.15
Git commit: b0f5bc3
Built: Wed Jun 2 11:54:58 2021
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.4.6
GitCommit: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc:
Version: 1.0.0-rc95
GitCommit: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Output of docker info:
Client:
Context: default
Debug Mode: false
Plugins:
buildx: Build with BuildKit (Docker Inc., v0.5.1-docker)
compose: Docker Compose (Docker Inc., v2.0.0-beta.6)
scan: Docker Scan (Docker Inc., v0.8.0)
Server:
Containers: 59
Running: 59
Paused: 0
Stopped: 0
Images: 36
Server Version: 20.10.7
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: active
NodeID: 8t72n1z0pvuzae8exm0t64t9y
Is Manager: true
ClusterID: lnr6ht5kmrls1lsm6vufgzhur
Managers: 1
Nodes: 1
Default Address Pool: 10.0.0.0/8
SubnetSize: 24
Data Path Port: 4789
Orchestration:
Task History Retention Limit: 5
Raft:
Snapshot Interval: 10000
Number of Old Snapshots to Retain: 0
Heartbeat Tick: 1
Election Tick: 10
Dispatcher:
Heartbeat Period: 5 seconds
CA Configuration:
Expiry Duration: 3 months
Force Rotate: 0
Autolock Managers: false
Root Rotation In Progress: false
Node Address: 192.168.65.3
Manager Addresses:
192.168.65.3:2377
Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
Default Runtime: runc
Init Binary: docker-init
containerd version: d71fcd7d8303cbf684402823e425e9dd2e99285d
runc version: b9ee9c6314599f1b4a7f497e1f1f856fe433d3b7
init version: de40ad0
Security Options:
seccomp
Profile: default
Kernel Version: 5.4.72-microsoft-standard-WSL2
Operating System: Docker Desktop
OSType: linux
Architecture: x86_64
CPUs: 4
Total Memory: 11.7GiB
Name: docker-desktop
ID: YUI3:VSRR:52EO:4Y4M:SDMO:MDZM:PSLE:HWFA:KI63:DPGV:AIFI:RNND
Docker Root Dir: /var/lib/docker
Debug Mode: false
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
WARNING: No blkio throttle.read_bps_device support
WARNING: No blkio throttle.write_bps_device support
WARNING: No blkio throttle.read_iops_device support
WARNING: No blkio throttle.write_iops_device support
Additional environment details (AWS, VirtualBox, physical, etc.):
AWS terraformed with https://github.com/trajano/terraform-docker-swarm-aws
/cc @AkihiroSuda
I see this also happens to me when there are particularly many containers. I am running a script that runs docker stats --no-stream every 10 seconds, parsing the data and writing it to a database. Our script initially opened pipe to docker stats and kept on receiving the stream, parsing every line of output and writing them to a database. But with over 150 containers running, the CPU that containerd consumed was very high than my script/database and keeping the stream open was taking my servers load average easily to 4 on an 8-core CPU with over 150 containers running. We have plenty of memory but still, the performance of docker stats command was very poor. docker stats --no-stream every 5 seconds is also a big overhead. 10 seconds interval seems quite good.
Our script experienced failures when there are so many containers, because docker stats sends -- instead of data. We tackled it by ignoring that entry for that moment. If this gets resolved and docker stats can provide a stream with less overhead, this might be a good upgrade many devs are looking for.
Consider keeping the legacy docker stats if curses are being introduced in #2034.
And the interval in stats is quite appreciated because opening a process every time is also an overhead.
@sibidharan you should use curl to the docker endpoint to get the data you need. https://docs.docker.com/engine/api/v1.41/#operation/ContainerStats at least that's what I'd likely be doing once I decide to write a local monitor that would track the last few data into a local in-memory database that destroys stuff after a minute since I don't need year old performance metrics.
@sibidharan you should use curl to the docker endpoint to get the data you need. https://docs.docker.com/engine/api/v1.41/#operation/ContainerStats at least that's what I'd likely be doing once I decide to write a local monitor that would track the last few data into a local in-memory database that destroys stuff after a minute since I don't need year old performance metrics.
With this, I can only get stats for one container, right? Is there a way to get the stats of all containers at once? If not using API to make many queries will be a big overhead.
Sorry for a late response.
The overhead will be the same regardless because the CLI just uses the API in the end.