panic: concurrent map writes
Description
Under some rare circumstances I get the following panic:
fatal error: concurrent map writes
goroutine 4067 [running]:
runtime.throw({0x7ff6f08f58e8?, 0x0?})
C:/Program Files/Go/src/runtime/panic.go:992 +0x76 fp=0xc0017bb908 sp=0xc0017bb8d8 pc=0x7ff6ef78a256
runtime.mapassign_faststr(0x7ff6f06f6ec0, 0xc000202ea0, {0x7ff6f08db616, 0x4})
C:/Program Files/Go/src/runtime/map_faststr.go:295 +0x38b fp=0xc0017bb970 sp=0xc0017bb908 pc=0x7ff6ef764d6b
gvisor.dev/gvisor/pkg/tcpip/network/internal/ip.(*GenericMulticastProtocolState).SendQueuedReportsLocked(0xc0002f4ef0)
C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/network/internal/ip/generic_multicast_protocol.go:297 +0x16c fp=0xc0017bba60 sp=0xc0017bb970 pc=0x7ff6f02415ac
gvisor.dev/gvisor/pkg/tcpip/network/ipv4.(*igmpState).sendQueuedReports(...)
C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/network/ipv4/igmp.go:415
gvisor.dev/gvisor/pkg/tcpip/network/ipv4.(*endpoint).sendQueuedReports(...)
C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/network/ipv4/ipv4.go:1299
gvisor.dev/gvisor/pkg/tcpip/network/ipv4.(*endpoint).AddAndAcquirePermanentAddress(0xc0002f4800, {{0xc00032bb0c?, 0xc0017bbbd0?}, 0x7ff6ef7830f7?}, {0x0, 0x0, {0x0, {0x0}, {0x0}}, 0x0, ...})
C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/network/ipv4/ipv4.go:1289 +0x16e fp=0xc0017bbb48 sp=0xc0017bba60 pc=0x7ff6f0252f2e
gvisor.dev/gvisor/pkg/tcpip/stack.(*nic).addAddress(0x20?, {0x2030002?, {{0xc00032bb0c?, 0xc000b68a20?}, 0x2d39cd60108?}}, {0x0, 0x0, {0x0, {0x0}, {0x0}}, ...})
C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/stack/nic.go:524 +0xd7 fp=0xc0017bbbe0 sp=0xc0017bbb48 pc=0x7ff6f01c73f7
gvisor.dev/gvisor/pkg/tcpip/stack.(*Stack).AddProtocolAddress(0x4?, 0x0?, {0x4?, {{0xc00032bb0c?, 0x4?}, 0x7ff657480f33?}}, {0x0, 0x0, {0x0, {0x0}, ...}, ...})
C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/stack/stack.go:1083 +0x159 fp=0xc0017bbca8 sp=0xc0017bbbe0 pc=0x7ff6f01d7199
github.com/mysteriumnetwork/node/services/wireguard/endpoint/netstack-provider.(*netTun).addAddress(0xc0001613b0?, {0xc00032bb0c, 0x4})
C:/Users/user/src/node/services/wireguard/endpoint/netstack-provider/netstack.go:257 +0x6c fp=0xc0017bbd58 sp=0xc0017bbca8 pc=0x7ff6f02807cc
github.com/mysteriumnetwork/node/services/wireguard/endpoint/netstack-provider.(*netTun).acceptTCP(0xc0001613b0, 0xc000e93a00)
C:/Users/user/src/node/services/wireguard/endpoint/netstack-provider/netstack.go:276 +0x105 fp=0xc0017bbfa8 sp=0xc0017bbd58 pc=0x7ff6f02809c5
github.com/mysteriumnetwork/node/services/wireguard/endpoint/netstack-provider.(*netTun).acceptTCP-fm(0xc000b687e0?)
<autogenerated>:1 +0x2c fp=0xc0017bbfc8 sp=0xc0017bbfa8 pc=0x7ff6f0282e0c
gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*Forwarder).HandlePacket.func3()
C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/transport/tcp/forwarder.go:97 +0x2a fp=0xc0017bbfe0 sp=0xc0017bbfc8 pc=0x7ff6f021c94a
runtime.goexit()
C:/Program Files/Go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0017bbfe8 sp=0xc0017bbfe0 pc=0x7ff6ef7bb5a1
created by gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*Forwarder).HandlePacket
C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/transport/tcp/forwarder.go:97 +0x38a
It looks like lock for GenericMulticastProtocolState->memberships is not acquired.
Steps to reproduce
Gvisor code is run from here: https://github.com/mysteriumnetwork/node/blob/master/services/wireguard/endpoint/netstack-provider/netstack.go
runsc version
No response
docker version (if using docker)
No response
uname
No response
kubectl (if using Kubernetes)
No response
repo state (if built from source)
No response
runsc debug logs (if available)
No response
Took a look -- it's not obvious to me where the concurrent write is that isn't taking the lock. Is there perhaps more output, e.g. other goroutine stacks in the panic?
Yes, there is gvisor-panic-stacktrace.txt
@kevinGC, SendQueuedReportsLocked() comment states that g.protocolMU must be locked. However, it's not being locked in the stack above. Did I miss anything?
A friendly reminder that this issue had no activity for 120 days.
A friendly reminder that this issue had no activity for 120 days.
SendQueuedReportsLocked() comment states that g.protocolMU must be locked. However, it's not being locked in the stack above. Did I miss anything?
@fvoznika I think it is being locked. So g.protocolMu is a pointer to a lock which is set here: https://github.com/google/gvisor/blob/39afbf09cdca7152b2d0d21b9cf9b655e1dda87f/pkg/tcpip/network/internal/ip/generic_multicast_protocol.go#L358-L366
The relevant caller (igmpState in the reported crash) initializes it with igmp.ep.mu: https://github.com/google/gvisor/blob/39afbf09cdca7152b2d0d21b9cf9b655e1dda87f/pkg/tcpip/network/ipv4/igmp.go#L283-L285
igmpState.sendQueuedReports() has a precondition that igmp.ep.mu must be locked: https://github.com/google/gvisor/blob/39afbf09cdca7152b2d0d21b9cf9b655e1dda87f/pkg/tcpip/network/ipv4/igmp.go#L599-L602
But its caller is only locking it for reading: https://github.com/google/gvisor/blob/39afbf09cdca7152b2d0d21b9cf9b655e1dda87f/pkg/tcpip/network/ipv4/ipv4.go#L1377-L1381
Even though there are checklocks annotations and it is provided that e.igmp.ep.mu=e.mu, the tools fails to figure this out.
The mutex is only read locked here: https://github.com/google/gvisor/blob/39afbf09cdca7152b2d0d21b9cf9b655e1dda87f/pkg/tcpip/network/ipv4/ipv4.go#L1365-L1371
This allows for concurrent writes.
The other ipv6 caller of SendQueuedReportsLocked() does not suffer from this, the entire caller stack seems to be write locking the correct ep.mu. I will send a fix for ipv4.