containers-roadmap
containers-roadmap copied to clipboard
[ECS] Add support for GPU with Docker 19.03
Summary
Docker 19.03 now has built-in support for GPU, there is no need to specify an alternate runtime. However, at run time, --gpus all
or a specific set of GPUs needs to be passed as an argument and it can't be done with dockerd config.
Description
Without --gpus all
$ docker run --rm nvidia/cuda:10.1-base nvidia-smi
docker: Error response from daemon: OCI runtime create failed: container_linux.go:345: starting container process caused "exec: \"nvidia-smi\": executable file not found in $PATH": unknown.
With --gpus all
$ docker run --gpus all --runtime runc --rm nvidia/cuda:10.1-base nvidia-smi
Thu Aug 29 13:54:36 2019
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 418.87.00 Driver Version: 418.87.00 CUDA Version: 10.1 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla K80 On | 00000000:00:0F.0 Off | 0 |
| N/A 35C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 1 Tesla K80 On | 00000000:00:10.0 Off | 0 |
| N/A 32C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 2 Tesla K80 On | 00000000:00:11.0 Off | 0 |
| N/A 40C P8 27W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 3 Tesla K80 On | 00000000:00:12.0 Off | 0 |
| N/A 35C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 4 Tesla K80 On | 00000000:00:13.0 Off | 0 |
| N/A 36C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 5 Tesla K80 On | 00000000:00:14.0 Off | 0 |
| N/A 34C P8 30W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 6 Tesla K80 On | 00000000:00:15.0 Off | 0 |
| N/A 40C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 7 Tesla K80 On | 00000000:00:16.0 Off | 0 |
| N/A 33C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 8 Tesla K80 On | 00000000:00:17.0 Off | 0 |
| N/A 36C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 9 Tesla K80 On | 00000000:00:18.0 Off | 0 |
| N/A 31C P8 30W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 10 Tesla K80 On | 00000000:00:19.0 Off | 0 |
| N/A 37C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 11 Tesla K80 On | 00000000:00:1A.0 Off | 0 |
| N/A 33C P8 29W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 12 Tesla K80 On | 00000000:00:1B.0 Off | 0 |
| N/A 38C P8 26W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 13 Tesla K80 On | 00000000:00:1C.0 Off | 0 |
| N/A 34C P8 32W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 14 Tesla K80 On | 00000000:00:1D.0 Off | 0 |
| N/A 39C P8 27W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
| 15 Tesla K80 On | 00000000:00:1E.0 Off | 0 |
| N/A 34C P8 30W / 149W | 0MiB / 11441MiB | 0% Default |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| No running processes found |
+-----------------------------------------------------------------------------+
Expected Behavior
Containers with GPU requirements should start
Observed Behavior
Environment Details
$ docker info
Client:
Debug Mode: false
Server:
Containers: 2
Running: 1
Paused: 0
Stopped: 1
Images: 4
Server Version: 19.03.1
Storage Driver: overlay2
Backing Filesystem: xfs
Supports d_type: true
Native Overlay Diff: true
Logging Driver: splunk
Cgroup Driver: cgroupfs
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 894b81a4b802e4eb2a91d1ce216b8817763c29fb
runc version: 425e105d5a03fabd737a126ad93d62a9eeede87f
init version: fec3683
Security Options:
seccomp
Profile: default
Kernel Version: 3.10.0-957.27.2.el7.x86_64
Operating System: CentOS Linux 7 (Core)
OSType: linux
Architecture: x86_64
CPUs: 64
Total Memory: 720.3GiB
Name: ip-10-45-8-153.us-west-2.compute.internal
ID: GA4Z:BCED:2FQG:AUKO:KUAX:7X5W:SBAR:NWB3:IHCH:6HQN:TIFW:PLOB
Docker Root Dir: /var/lib/docker
Debug Mode: true
File Descriptors: 31
Goroutines: 51
System Time: 2019-08-29T13:55:51.172917082Z
EventsListeners: 0
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: true
$ curl http://localhost:51678/v1/metadata
{"Cluster":"BATCHCLUSTER_Batch_0d2792a4-22a0-37e9-8e8a-5e8b68c1be17","ContainerInstanceArn":"arn:aws:ecs:us-west-2::container-instance/BATCHCLUSTER_Batch_0d2792a4-22a0-37e9-8e8a-5e8b68c1be17/50e649e34b83423189684b82669a1cea","Version":"Amazon ECS Agent - v1.30.0 (02ff320c)"}
Supporting Log Snippets
Logs can be provided on request.