gvisor icon indicating copy to clipboard operation
gvisor copied to clipboard

panic: concurrent map writes

Open Zensey opened this issue 3 years ago • 2 comments

Description

Under some rare circumstances I get the following panic:

fatal error: concurrent map writes

goroutine 4067 [running]:
runtime.throw({0x7ff6f08f58e8?, 0x0?})
        C:/Program Files/Go/src/runtime/panic.go:992 +0x76 fp=0xc0017bb908 sp=0xc0017bb8d8 pc=0x7ff6ef78a256
runtime.mapassign_faststr(0x7ff6f06f6ec0, 0xc000202ea0, {0x7ff6f08db616, 0x4})
        C:/Program Files/Go/src/runtime/map_faststr.go:295 +0x38b fp=0xc0017bb970 sp=0xc0017bb908 pc=0x7ff6ef764d6b
gvisor.dev/gvisor/pkg/tcpip/network/internal/ip.(*GenericMulticastProtocolState).SendQueuedReportsLocked(0xc0002f4ef0)
        C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/network/internal/ip/generic_multicast_protocol.go:297 +0x16c fp=0xc0017bba60 sp=0xc0017bb970 pc=0x7ff6f02415ac
gvisor.dev/gvisor/pkg/tcpip/network/ipv4.(*igmpState).sendQueuedReports(...)
        C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/network/ipv4/igmp.go:415
gvisor.dev/gvisor/pkg/tcpip/network/ipv4.(*endpoint).sendQueuedReports(...)
        C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/network/ipv4/ipv4.go:1299
gvisor.dev/gvisor/pkg/tcpip/network/ipv4.(*endpoint).AddAndAcquirePermanentAddress(0xc0002f4800, {{0xc00032bb0c?, 0xc0017bbbd0?}, 0x7ff6ef7830f7?}, {0x0, 0x0, {0x0, {0x0}, {0x0}}, 0x0, ...})
        C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/network/ipv4/ipv4.go:1289 +0x16e fp=0xc0017bbb48 sp=0xc0017bba60 pc=0x7ff6f0252f2e
gvisor.dev/gvisor/pkg/tcpip/stack.(*nic).addAddress(0x20?, {0x2030002?, {{0xc00032bb0c?, 0xc000b68a20?}, 0x2d39cd60108?}}, {0x0, 0x0, {0x0, {0x0}, {0x0}}, ...})
        C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/stack/nic.go:524 +0xd7 fp=0xc0017bbbe0 sp=0xc0017bbb48 pc=0x7ff6f01c73f7
gvisor.dev/gvisor/pkg/tcpip/stack.(*Stack).AddProtocolAddress(0x4?, 0x0?, {0x4?, {{0xc00032bb0c?, 0x4?}, 0x7ff657480f33?}}, {0x0, 0x0, {0x0, {0x0}, ...}, ...})
        C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/stack/stack.go:1083 +0x159 fp=0xc0017bbca8 sp=0xc0017bbbe0 pc=0x7ff6f01d7199
github.com/mysteriumnetwork/node/services/wireguard/endpoint/netstack-provider.(*netTun).addAddress(0xc0001613b0?, {0xc00032bb0c, 0x4})
        C:/Users/user/src/node/services/wireguard/endpoint/netstack-provider/netstack.go:257 +0x6c fp=0xc0017bbd58 sp=0xc0017bbca8 pc=0x7ff6f02807cc
github.com/mysteriumnetwork/node/services/wireguard/endpoint/netstack-provider.(*netTun).acceptTCP(0xc0001613b0, 0xc000e93a00)
        C:/Users/user/src/node/services/wireguard/endpoint/netstack-provider/netstack.go:276 +0x105 fp=0xc0017bbfa8 sp=0xc0017bbd58 pc=0x7ff6f02809c5
github.com/mysteriumnetwork/node/services/wireguard/endpoint/netstack-provider.(*netTun).acceptTCP-fm(0xc000b687e0?)
        <autogenerated>:1 +0x2c fp=0xc0017bbfc8 sp=0xc0017bbfa8 pc=0x7ff6f0282e0c
gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*Forwarder).HandlePacket.func3()
        C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/transport/tcp/forwarder.go:97 +0x2a fp=0xc0017bbfe0 sp=0xc0017bbfc8 pc=0x7ff6f021c94a
runtime.goexit()
        C:/Program Files/Go/src/runtime/asm_amd64.s:1571 +0x1 fp=0xc0017bbfe8 sp=0xc0017bbfe0 pc=0x7ff6ef7bb5a1
created by gvisor.dev/gvisor/pkg/tcpip/transport/tcp.(*Forwarder).HandlePacket
        C:/Users/user/go/pkg/mod/gvisor.dev/[email protected]/pkg/tcpip/transport/tcp/forwarder.go:97 +0x38a
        

It looks like lock for GenericMulticastProtocolState->memberships is not acquired.

Steps to reproduce

Gvisor code is run from here: https://github.com/mysteriumnetwork/node/blob/master/services/wireguard/endpoint/netstack-provider/netstack.go

runsc version

No response

docker version (if using docker)

No response

uname

No response

kubectl (if using Kubernetes)

No response

repo state (if built from source)

No response

runsc debug logs (if available)

No response

Zensey avatar Sep 06 '22 20:09 Zensey

Took a look -- it's not obvious to me where the concurrent write is that isn't taking the lock. Is there perhaps more output, e.g. other goroutine stacks in the panic?

kevinGC avatar Sep 06 '22 22:09 kevinGC

Yes, there is gvisor-panic-stacktrace.txt

Zensey avatar Sep 07 '22 10:09 Zensey

@kevinGC, SendQueuedReportsLocked() comment states that g.protocolMU must be locked. However, it's not being locked in the stack above. Did I miss anything?

fvoznika avatar Oct 05 '22 23:10 fvoznika

A friendly reminder that this issue had no activity for 120 days.

github-actions[bot] avatar Sep 13 '23 00:09 github-actions[bot]

A friendly reminder that this issue had no activity for 120 days.

github-actions[bot] avatar Jan 12 '24 00:01 github-actions[bot]

SendQueuedReportsLocked() comment states that g.protocolMU must be locked. However, it's not being locked in the stack above. Did I miss anything?

@fvoznika I think it is being locked. So g.protocolMu is a pointer to a lock which is set here: https://github.com/google/gvisor/blob/39afbf09cdca7152b2d0d21b9cf9b655e1dda87f/pkg/tcpip/network/internal/ip/generic_multicast_protocol.go#L358-L366

The relevant caller (igmpState in the reported crash) initializes it with igmp.ep.mu: https://github.com/google/gvisor/blob/39afbf09cdca7152b2d0d21b9cf9b655e1dda87f/pkg/tcpip/network/ipv4/igmp.go#L283-L285

igmpState.sendQueuedReports() has a precondition that igmp.ep.mu must be locked: https://github.com/google/gvisor/blob/39afbf09cdca7152b2d0d21b9cf9b655e1dda87f/pkg/tcpip/network/ipv4/igmp.go#L599-L602

But its caller is only locking it for reading: https://github.com/google/gvisor/blob/39afbf09cdca7152b2d0d21b9cf9b655e1dda87f/pkg/tcpip/network/ipv4/ipv4.go#L1377-L1381

Even though there are checklocks annotations and it is provided that e.igmp.ep.mu=e.mu, the tools fails to figure this out.

The mutex is only read locked here: https://github.com/google/gvisor/blob/39afbf09cdca7152b2d0d21b9cf9b655e1dda87f/pkg/tcpip/network/ipv4/ipv4.go#L1365-L1371

This allows for concurrent writes.

ayushr2 avatar Jan 12 '24 04:01 ayushr2

The other ipv6 caller of SendQueuedReportsLocked() does not suffer from this, the entire caller stack seems to be write locking the correct ep.mu. I will send a fix for ipv4.

ayushr2 avatar Jan 12 '24 20:01 ayushr2