testcontainers-go
testcontainers-go copied to clipboard
[Bug]: NPE when call DockerContainer.State()
Testcontainers version
0.13.0
Using the latest Testcontainers version?
Yes
Host OS
Linux
Host Arch
x86
Go Version
1.18
Docker version
Client: Docker Engine - Community
Version: 20.10.18
API version: 1.41
Go version: go1.18.6
Git commit: b40c2f6
Built: Thu Sep 8 23:11:45 2022
OS/Arch: linux/amd64
Context: default
Experimental: true
Server: Docker Engine - Community
Engine:
Version: 20.10.18
API version: 1.41 (minimum version 1.12)
Go version: go1.18.6
Git commit: e42327a
Built: Thu Sep 8 23:09:37 2022
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.8
GitCommit: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
runc:
Version: 1.1.4
GitCommit: v1.1.4-0-g5fd4c4d
docker-init:
Version: 0.19.0
GitCommit: de40ad0
Client:
Context: default
Debug Mode: false
WARNING: No swap limit support
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
scan: Docker Scan (Docker Inc., v0.17.0)
Server:
Containers: 1
Running: 0
Paused: 0
Stopped: 1
Images: 88
Server Version: 20.10.18
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runc.v2 io.containerd.runtime.v1.linux runc
Default Runtime: runc
Init Binary: docker-init
containerd version: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
runc version: v1.1.4-0-g5fd4c4d
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.4.0-1029-aws
Operating System: Ubuntu 20.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.08GiB
Name: ip-10-128-46-212
ID: ZN3Z:JLTC:AMIW:LQ26:7Z6S:IZMY:TDAJ:ELML:JK6W:RPTQ:BTGO:KFMB
Docker Root Dir: /var/lib/docker
Debug Mode: false
Username: ***
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
Docker info
Client:
Context: default
Debug Mode: false
Plugins:
app: Docker App (Docker Inc., v0.9.1-beta3)
buildx: Docker Buildx (Docker Inc., v0.9.1-docker)
scan: Docker Scan (Docker Inc., v0.17.0)
Server:
Containers: 6
Running: 5
Paused: 0
Stopped: 1
Images: 131
Server Version: 20.10.18
Storage Driver: overlay2
Backing Filesystem: extfs
Supports d_type: true
Native Overlay Diff: true
userxattr: false
Logging Driver: json-file
Cgroup Driver: cgroupfs
Cgroup Version: 1
Plugins:
WARNING: No swap limit support
Volume: local
Network: bridge host ipvlan macvlan null overlay
Log: awslogs fluentd gcplogs gelf journald json-file local logentries splunk syslog
Swarm: inactive
Runtimes: io.containerd.runtime.v1.linux runc io.containerd.runc.v2
Default Runtime: runc
Init Binary: docker-init
containerd version: 9cd3357b7fd7218e4aec3eae239db1f68a5a6ec6
runc version: v1.1.4-0-g5fd4c4d
init version: de40ad0
Security Options:
apparmor
seccomp
Profile: default
Kernel Version: 5.4.0-1029-aws
Operating System: Ubuntu 20.04.1 LTS
OSType: linux
Architecture: x86_64
CPUs: 8
Total Memory: 15.18GiB
Name: ip-10-128-78-126
ID: 4YMG:OD3V:ZOHD:ZCIX:C4WR:KNIQ:W2X7:GLTU:P3OY:CGJD:XUDC:MUHY
Docker Root Dir: /var/lib/docker
Debug Mode: false
Username: ***
Registry: https://index.docker.io/v1/
Labels:
Experimental: false
Insecure Registries:
127.0.0.0/8
Live Restore Enabled: false
What happened?
When we call DockerContainer.State() in our test code, the NPE happened. IMHO, It seems because member variable raw is called without initialization and it is possible if inspectRawContainer, which initialize the variable 'raw', return error.
Relevant log output
panic: runtime error: invalid memory address or nil pointer dereference [recovered]
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0xcdacd6]
goroutine 50 [running]:
testing.tRunner.func1.2({0x11cf000, 0x1c42850})
/home/ubuntu/actions-runner/_work/_tool/go/1.18.0/x64/src/testing/testing.go:1389 +0x24e
testing.tRunner.func1()
/home/ubuntu/actions-runner/_work/_tool/go/1.18.0/x64/src/testing/testing.go:1392 +0x39f
panic({0x11cf000, 0x1c42850})
/home/ubuntu/actions-runner/_work/_tool/go/1.18.0/x64/src/runtime/panic.go:838 +0x207
github.com/testcontainers/testcontainers-go.(*DockerContainer).State(0xc00054f040, {0x156e6d0?, 0xc00010e5a0?})
/root/go/pkg/mod/github.com/testcontainers/[email protected]/docker.go:294 +0x36
Additional Information
No response
@doubleknd26 I'm not able to reproduce this with the current tests. In #543 I'm directly calling the State func of the container, and I double checked that this new code, but applied on top of v0.13.0, behaves exactly the same.
But looking at the code, the first call to inspectRawContainer that errors, and consequently sets raw to nil should cause that the next call to the State func must panic as you described in this issue:
// update container raw info
func (c *DockerContainer) inspectRawContainer(ctx context.Context) (*types.ContainerJSON, error) {
inspect, err := c.provider.client.ContainerInspect(ctx, c.ID)
if err != nil {
return nil, err
}
c.raw = &inspect
return c.raw, nil
}
// State returns container's running state
func (c *DockerContainer) State(ctx context.Context) (*types.ContainerState, error) {
inspect, err := c.inspectRawContainer(ctx)
if err != nil {
return c.raw.State, err
}
return inspect.State, nil
}
I'm tracing back to the root cause, and noticed that the return of the raw value was changed in https://github.com/testcontainers/testcontainers-go/pull/271, from nil > c.raw.State. I'm currently trying to identify why that need, and why there has no other complain about that very suspicious condition.
Could you share more about the tests where you found the panic?
Could you share more about the tests where you found the panic?
@mdelapenya Hi. It happened intermittently on our Github Action server managed by our company and unfortunately I cannot access the server. But, I'm sure the server doesn't have enough resource like cpu, mem. So I think It is because the container has been deleted unexpectedly.
I tested on my local to check the case. Here is a sample test code. I hope it makes sense to you.
-
run the test code using debug mode and catch it by break point at L733.

-
go to State() -> nspectRawContainer() -> provider.client.ContainerInspect() and stop at 18L.

-
delete container.

-
Step over and confirm error is returned.

-
go back to State() and confirm c.raw is called without init.

-
NPE

Thanks for such detailed report!
That seems correct: at the moment the container exits for any reasons (resource limits?) before the inspectRaw func is called, then any call to State should fail.
I'm still in here:
I'm tracing back to the root cause, and noticed that the return of the raw value was changed in https://github.com/testcontainers/testcontainers-go/pull/271, from nil > c.raw.State. I'm currently trying to identify why that need, and why there has no other complain about that very suspicious condition.
It seems that returning nil on error would make sense, although I want to check all possible the implications from that issue.
@doubleknd26 I'm able to reproduce it with this simple test:
func TestContainerStateAfterTermination(t *testing.T) {
ctx := context.Background()
nginxA, err := GenericContainer(ctx, GenericContainerRequest{
ProviderType: providerType,
ContainerRequest: ContainerRequest{
Image: nginxAlpineImage,
ExposedPorts: []string{
nginxDefaultPort,
},
},
Started: true,
})
if err != nil {
t.Fatal(err)
}
err = nginxA.Terminate(ctx)
if err != nil {
t.Fatal(err)
}
state, err := nginxA.State(ctx)
assert.Nil(t, state, "expected nil container inspect.")
assert.Error(t, err, "expected error from container inspect.")
}
The error comes because the State func is called after the container is terminated (probably something the user should know), but the library should be resilient enough to not panic and return a nil JSON state.
I'm elaborating a fix at this moment. Thanks again for the report!