go icon indicating copy to clipboard operation
go copied to clipboard

runtime/race: ThreadSanitizer failed to allocate 0x0000005c9000 (6066176) bytes at 0x200dc940a0000 (error code: 87)

Open neclepsio opened this issue 3 years ago • 42 comments

What version of Go are you using (go version)?

$ go version
go version go1.16.4 windows/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
set GO111MODULE=on
set GOARCH=amd64
set GOBIN=
set GOCACHE=C:\Users\---\AppData\Local\go-build
set GOENV=C:\Users\---\AppData\Roaming\go\env
set GOEXE=.exe
set GOFLAGS=
set GOHOSTARCH=amd64
set GOHOSTOS=windows
set GOINSECURE=
set GOMODCACHE=c:\---\pkg\mod
set GONOPROXY=
set GONOSUMDB=
set GOOS=windows
set GOPATH=c:\---
set GOPRIVATE=
set GOPROXY=https://proxy.golang.org,direct
set GOROOT=c:\---\Go\go1.16.4
set GOSUMDB=sum.golang.org
set GOTMPDIR=
set GOTOOLDIR=c:\---\Go\go1.16.4\pkg\tool\windows_amd64
set GOVERSION=go1.16.4
set GCCGO=gccgo
set AR=ar
set CC=gcc
set CXX=g++
set CGO_ENABLED=1
set GOMOD=C:\---\go.mod
set CGO_CFLAGS=-g -O2
set CGO_CPPFLAGS=
set CGO_CXXFLAGS=-g -O2
set CGO_FFLAGS=-g -O2
set CGO_LDFLAGS=-g -O2
set PKG_CONFIG=pkg-config
set GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\---\AppData\Local\Temp\go-build71564379=/tmp/go-build -gno-record-gcc-switches

What did you do?

Use the source in "Usage" section of https://github.com/go-gl/glfw, then run:

$ go build -race; ./test

What did you expect to see?

The program correctly running, as if -race was not provided.

What did you see instead?

ERROR: ThreadSanitizer failed to allocate 0x0000005c9000 (6066176) bytes at 0x200dc940a0000 (error code: 87)

The error appears on each run. The count, address and error code are always the same. Using different program, the count and address are different.

neclepsio avatar May 11 '21 06:05 neclepsio

The race detector uses ~10x the memory that a run without -race uses. Are you actually running out of memory? How much memory does a regular run use?

It is possible that the race detector can't allocate memory because the address space it wants is being used by something else. I'm not sure exactly what would cause that, but if your code mmaps or dlopens lots of stuff, that might be a contributing factor.

randall77 avatar May 11 '21 12:05 randall77

The task manager show less then 10Mb of use for the simple example. Looking to GLFW source code, mmap is used for some small bitmaps and there are some dlopen, just to OpenGL and related libraries, but it seems to me that they are not used in Windows (LoadLibraryA is used instead, I don't know if it is the same). Moreover, error code 87 is "the parameter is incorrect" under Windows.

neclepsio avatar May 11 '21 12:05 neclepsio

Hm, I don't know then. You have officially overreached my Windows knowledge :(

randall77 avatar May 11 '21 12:05 randall77

cc @dvyukov @bufflig

heschi avatar May 11 '21 18:05 heschi

I too am seeing this on my project. I need to confirm, but I think I wasn't seeing the issue with 1.16.3, but then hit it with 1.16.4

jazzy-crane avatar May 14 '21 12:05 jazzy-crane

Also seeing this with fkie-cad/yapscan on windows only. That being said, I do do a bunch of c-stuff and reading remote process memory.

go version go1.16.4 windows/amd64

targodan avatar May 14 '21 12:05 targodan

I too am seeing this on my project. I need to confirm, but I think I wasn't seeing the issue with 1.16.3, but then hit it with 1.16.4

Sorry, scratch that. I do see with 1.16.3 too. Go version bump coincided with other toolchain upgrades

jazzy-crane avatar May 14 '21 12:05 jazzy-crane

cc @zx2c4

networkimprov avatar May 16 '21 23:05 networkimprov

The problem is still present in 1.16.5.

neclepsio avatar Jun 10 '21 09:06 neclepsio

Also affects 1.17beta1.

neclepsio avatar Jun 11 '21 07:06 neclepsio

Still present in 1.17rc1

neclepsio avatar Jul 14 '21 09:07 neclepsio

cc @dvyukov

networkimprov avatar Jul 21 '21 21:07 networkimprov

I don't have a windows machine, so I can't debug it. Why can VirtualAlloc return 87? I thought maybe it's because size 0x5c9000 is not a multiple of allocation granularity (64k), but the msdn page says that in such cases the size is simply rounded up.

dvyukov avatar Jul 22 '21 06:07 dvyukov

Is the MEM_LARGE_PAGES flag used? In that case the size needs to be a multiple of GetLargePageMinimum.

  1. Include the MEM_LARGE_PAGES value when calling the VirtualAlloc function. The size and alignment must be a multiple of the large-page minimum.

Source: https://docs.microsoft.com/en-us/windows/win32/memory/large-page-support

targodan avatar Jul 22 '21 07:07 targodan

I don't know if it's relevant, but according to this message in haskell forum: "this means that address given to VirtualAlloc is either not reserved yet, or that size is too big (ie the block-to-be-committed isn't inside one VirtualAlloc MEM_RESERVE block)". Another source point to ASLR, but this should not be the case since the problem is always with the same address. This IBM bug report states: Windows API VirtualAlloc is requesed to allocate memory ofsize 64KB with flag MEM_LARGE_PAGES. This is a non-standard allocation and the API fails with an error "The parameter is incorrect".

neclepsio avatar Jul 22 '21 07:07 neclepsio

MEM_LARGE_PAGES does not seem to be used in compiler-rt/* dir: https://github.com/llvm/llvm-project/search?q=MEM_LARGE_PAGES

dvyukov avatar Jul 22 '21 07:07 dvyukov

This function matches the error message. There is an exception for Windows AddressSanitizer not applied to Go but it seems not related.

neclepsio avatar Jul 22 '21 08:07 neclepsio

It also seems like the allocation is done with a fixed address?

Even if that is not the core of the issue in this case (since it seems to always happen), wouldn't that potentially lead to pseudorandom failures if ASLR decides to load a DLL there?

targodan avatar Jul 22 '21 08:07 targodan

It also seems like the allocation is done with a fixed address?

Yes, with a fixed address. Well, first, tsan needs to allocate memory at a fixed address. So it's not that it's done for no reason, nor that it's possible to just remove the address. Second, on linux tsan will avoid regions where kernel can load anything and it reserves the remaining regions, so that even user mmap's won't happen at these addresses. I don't remember what happen on windows.

dvyukov avatar Jul 22 '21 08:07 dvyukov

I see. Thanks for clarifying. :)

targodan avatar Jul 22 '21 08:07 targodan

Thought I'd give some data points:

It runs fine if I cross compiled from linux with (GCC) 9.3-posix 20200320 It does not run if I used TDM GCC 9.2 or 10.3 It also runs fine with (x86_64-posix-seh-rev0, Built by MinGW-W64 project) 8.1.0

Looks like it might be specific to TDM GCC

EDIT:

Running with gdb, it crashes at: racecall(&__tsan_map_shadow, start, size, 0, 0)

https://github.com/golang/go/blob/go1.16.6/src/runtime/race.go#L393 https://github.com/golang/go/blob/go1.16.6/src/runtime/race_amd64.s#L413

AlexRouSg avatar Jul 22 '21 09:07 AlexRouSg

This also happens with MSYS2 GCC.

$ gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=c:/Ignazio/Lavoro/IgnPack/Go/msys64/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/10.3.0/lto-wrapper.exe
Target: x86_64-w64-mingw32
Configured with: ../gcc-10.3.0/configure --prefix=/mingw64 --with-local-prefix=/mingw64/local --build=x86_64-w64-mingw32 
--host=x86_64-w64-mingw32 --target=x86_64-w64-mingw32 
--with-native-system-header-dir=/mingw64/x86_64-w64-mingw32/include --libexecdir=/mingw64/lib 
--enable-bootstrap --enable-checking=release --with-arch=x86-64 --with-tune=generic 
--enable-languages=c,lto,c++,fortran,ada,objc,obj-c++,jit --enable-shared --enable-static --enable-libatomic 
--enable-threads=posix --enable-graphite --enable-fully-dynamic-string --enable-libstdcxx-filesystem-ts=yes 
--enable-libstdcxx-time=yes --disable-libstdcxx-pch --disable-libstdcxx-debug --enable-lto --enable-libgomp 
--disable-multilib --disable-rpath --disable-win32-registry --disable-nls --disable-werror --disable-symvers 
--with-libiconv --with-system-zlib --with-gmp=/mingw64 --with-mpfr=/mingw64 --with-mpc=/mingw64 
--with-isl=/mingw64 --with-pkgversion='Rev5, Built by MSYS2 project'
 --with-bugurl=https://github.com/msys2/MINGW-packages/issues --with-gnu-as --with-gnu-ld 
--with-boot-ldflags='-pipe -Wl,--dynamicbase,--high-entropy-va,--nxcompat,--default-image-base-high 
-Wl,--disable-dynamicbase -static-libstdc++ -static-libgcc' 'LDFLAGS_FOR_TARGET=-pipe 
-Wl,--dynamicbase,--high-entropy-va,--nxcompat,--default-image-base-high' 
--enable-linker-plugin-flags='LDFLAGS=-static-libstdc++\ -static-libgcc\ -pipe\ 
-Wl,--dynamicbase,--high-entropy-va,--nxcompat,--default-image-base-high\ -Wl,--stack,12582912'
Thread model: posix
Supported LTO compression algorithms: zlib zstd
gcc version 10.3.0 (Rev5, Built by MSYS2 project)

neclepsio avatar Jul 22 '21 09:07 neclepsio

Having the same issue when trying running make test on azure pipelines. Win 2019 vm, GCC 11.2, Go 1.16.3. Has anyone tried with earlier/newer versions of Go and does that work? And the address each run is different in my case..

MuweiHe avatar Aug 26 '21 00:08 MuweiHe

Having the same issue when trying running make test on azure pipelines. Win 2019 vm, GCC 11.2, Go 1.16.3. Has anyone tried with earlier/newer versions of Go and does that work? And the address each run is different in my case..

I didn't have this problem with 1.16.3 , but was running musl GCC 9.2.1 mingw64 toolchain . When I upgraded Go to 1.16.4 I upgraded the toolchain to GCC 10.X at the same time. That's when I started having problems. So I think the GCC version is the significant thing, not the Go version. Unfortunately I haven't been able to try with >1.16.3 and a 9.2.1 toolchain

jazzy-crane avatar Aug 27 '21 14:08 jazzy-crane

I feel like I have a basic understanding of two parts of this issue, but not how they fit together. Please correct anything I've gotten wrong.

  1. The tsan runtime is implemented in compiler-rt which is part of the LLVM project. That is built into a .syso file using a specific version of Go and LLVM and that .syso file is then incorporated into binaries built by Go when the -race flag is included. In go code that uses cgo, the tsan runtime is calling a Windows API function with invalid parameters, which causes the panics in the logfile.
  2. Per this issue, the failure appears to happen across a variety of Go versions (Go>1.16.3?), but with specific gcc versions (gcc<10.0.0?).

I don't understand how those two parts are linked. Here's my guess:

When building with -race, Go must modify or wrap cgo bytecode to play nicely with the race detector. That bytecode is produced by the installed gcc, manipulated by the Go compiler, and then must run in an environment managed by tsan. My guess is that Go makes some assumptions about the produced code that are no longer true in newer gcc versions.

djmitche avatar Sep 16 '21 13:09 djmitche

The raceinit function @AlexRouSg pointed to calls __tsan_map_shadow after rounding its size parameter to a page. __tsan_map_shadow calls MapShadow. That function gets the actual page size (dwPageSize via GetSystemInfo) and further rounds the start and end points of the region. It then calls MmapFixedSuperNoReserve which calls directly to MmapFixedNoReserve. MmapFixedSuperNoReserve has a "FIXME" comment about using large-page support, but it seems like an invitation to an optimization, not a potential bug. On the first call, MapShadow also calls MmapFixedSuperNoReserve for the "data" segment, with explicit 64k alignment. Since the values in the error messages aren't 64k aligned, I think that's not the problematic call.

==5736==ERROR: ThreadSanitizer failed to allocate 0x000000909000 (9474048) bytes at 0x200dd4b374000 (error code: 87)

both of those values are 4k-aligned (and it looks like 4k is the page size on windows).

According to the docs for VirtualAlloc alignment shouldn't even be necessary - it rounds as necessary, except in the case of MEM_LARGE_PAGES which isn't in use here.

This is all related to point 1 above. I don't have a good way to start looking at point 2.

So, a few possibilities here:

  • MS docs are wrong and 64k alignment is required (the comments in the llvm files suggest this!)
  • This memory is already mapped somehow, in a way that makes it invalid (but, I think remapping is OK..)

That first possibility sounds relevant.. maybe gcc used to produce 64k-aligned regions in its output, and no longer does?

djmitche avatar Sep 16 '21 14:09 djmitche

On msys distribution 20210725 we found that downgrading gcc didn't fix the issue, but downgrading binutils from latest (2.36.1-3 as of our test) to 2.35.1-2 did fix the ThreadSanitizer issue.

albertvaka avatar Oct 07 '21 14:10 albertvaka

On msys distribution 20210725 we found that downgrading gcc didn't fix the issue, but downgrading binutils from latest (2.36.1-3 as of our test) to 2.35.1-2 did fix the ThreadSanitizer issue.

My gcc is 10.3.0 (tdm64-1). I also met the same problem. According to the comment, I used binutils 2.33.1 and fixed the issue.

go version go1.17.2 windows/amd64 gcc version 8.1.0 (x86_64-posix-seh-rev0, Built by MinGW-W64 project) also OK

huifly avatar Nov 03 '21 03:11 huifly

I have had the same problem.

==23768==ERROR: ThreadSanitizer failed to allocate 0x0000016a1000 (23728128) bytes at 0x200d9afc00000 (error code: 87)
exit status 66

This problem carried on with 1.17.x version, however, as @huifly said, downgrading binutils helps so far.

I have been using this custom package from MinGW Distro - nuwen.net, version 17.1 that has gcc 9.2.0 and binutils 2.33.1 bundled and it works with go1.17.6 just fine, however, it appears there are some nice scripts that could enable building custom combination of desired tools so one could have all the packages upgraded while maintaining lower version of binutils.

In addition to that, it is a selfextracting package that only requires setting the path properly, so testing and deploying is rather trivial.

bbanelli avatar Jan 22 '22 18:01 bbanelli

After I stripping aslr It works :) on windows that because they tried to allocate at reserved address space above max user address space on amd64

tylermasci16 avatar Feb 28 '22 20:02 tylermasci16

@tylermasci16 could you please explain how can I "strip aslr"? The only thing I found is go build -race -aslr=false do it with this commit: https://github.com/golang/go/commit/56dac60074698d23dc6acc047e61d2ad59c9610d but seems to work only for c-shared builds.

neclepsio avatar Feb 28 '22 21:02 neclepsio

FWIW, according to https://docs.microsoft.com/en-us/windows/win32/debug/system-error-codes--0-499-, error code: 87 is ERROR_INVALID_PARAMETER: “The parameter is incorrect.”

(Of course, that doesn't tell us which parameter is incorrect or for what reason, and ThreadSanitizer didn't even have the courtesy to tell us which system call produced the error. 😵)

bcmills avatar Jun 01 '22 14:06 bcmills

LE: it was the MinGW version we were using, please check out this follow-up comment.

We've started seeing this exact error in the containerd Windows CI workflow workflow since Friday (10/Jun/22) after switching from Go 1.18.0 to 1.18.3 in this PR.

+ integration ==1400==ERROR: ThreadSanitizer failed to allocate 0x0000025f9000 (39817216) bytes at 0x200dbfc7c0000 (error code: 87) exit status 66

In my debugging attempts so far I have tried the following (all leading to the same failure):

  • increasing the VM size on Azure in case memory really was an issue
  • reverting all other containerd-side patches we've had since Friday in case those were causing this
  • tried switching back to Go 1.18.0 for Windows (this may indicate it may be a bug in 1.18.3 it which may have been backported to 1.18.X)
  • currently trying to reproduce on my own machine (all of the above-mentioned test runs were on Azure-hosted VMs)

Random notes:

  • seems 100% consistent since Friday (hadn't had a single non-crashing run since)
  • all the tests run on Azure VMs spawned from the official Microsoft images which we've been using since March
  • we've been installing Go using Chocolatey this whole time as seen here

The error appears on each run. The count, address and error code are always the same. Using different program, the count and address are different.

  • I can also confirm that rebuilding the binary leads to different byte allocation count/address FWIW

Go env from the Azure machines used for the tests is the following:

# (Identical for both 2019 and 2022 except for the `debug-prefix-map`):
GO111MODULE=set
GOARCH=amd64set
GOBIN=set
GOCACHE=C:\Users\azureuser\AppData\Local\go-buildset
GOENV=C:\Users\azureuser\AppData\Roaming\go\envset
GOEXE=.exeset
GOEXPERIMENT=set
GOFLAGS=set
GOHOSTARCH=amd64set
GOHOSTOS=windowsset
GOINSECURE=set
GOMODCACHE=C:\Users\azureuser\go\pkg\modset
GONOPROXY=set
GONOSUMDB=set
GOOS=windowsset
GOPATH=C:\Users\azureuser\goset
GOPRIVATE=set
GOPROXY=https://proxy.golang.org,directset
GOROOT=c:\Program Files\Goset
GOSUMDB=sum.golang.orgset
GOTMPDIR=set
GOTOOLDIR=c:\Program Files\Go\pkg\tool\windows_amd64set
GOVCS=set
GOVERSION=go1.18.3set
GCCGO=gccgoset
GOAMD64=v1set
AR=arset
CC=gccset
CXX=g++set
CGO_ENABLED=1set
GOMOD=NULset
GOWORK=set
CGO_CFLAGS=-g -O2set
CGO_CPPFLAGS=set
CGO_CXXFLAGS=-g -O2set
CGO_FFLAGS=-g -O2set
CGO_LDFLAGS=-g -O2set
PKG_CONFIG=pkg-configset
GOGCCFLAGS=-m64 -mthreads -fmessage-length=0 -fdebug-prefix-map=C:\Users\azureuser\AppData\Local\Temp\go-build2049959765=/tmp/go-build -gno-record-gcc-switches

Any additional info or debugging ideas would be much appreciated!

aznashwan avatar Jun 15 '22 11:06 aznashwan

I think you've updated minGW too, that's what's causing the issue.

neclepsio avatar Jun 15 '22 11:06 neclepsio

@ncantelmo that was it, thanks a lot for all the help!

For future reference for anyone else running into this issue with a Chocolatey-installed MinGW:

  • mingw = "10.2.0" works
  • mingw = "10.3.0" fails
  • mingw = "11.X.0" all fail (including 11.3.0)

aznashwan avatar Jun 15 '22 11:06 aznashwan

Run into this suddenly this week, 10.3.0 isn't working either. Is there a canonical advised development stack?

Happens with dlv too:

==34372==ERROR: ThreadSanitizer failed to allocate 0x000003061000 (50728960) bytes at 0x200dabe7d0000 (error code: 87)

Very unhelpful error, where is it from? The Go compiler itself?

I was using TDM but it seems dead/unmaintained so I switched to MinGW and that fails too.

Edit:

So this worked for me:

choco install mingw --version 10.2.0 --allow-downgrade

Southclaws avatar Jun 17 '22 10:06 Southclaws

The ERROR_INVALID_PARAMETER error from VirtualAlloc is because the base address is too high. The allocation needs to fit within the minimum and maximum application addresses as provided by GetSystemInfo.

dblohm7 avatar Jun 24 '22 20:06 dblohm7

If https://github.com/llvm/llvm-project/blob/8246b2e156568c31e71e16cbaf4c14d316e7c06e/compiler-rt/lib/tsan/rtl-old/tsan_rtl.cpp#L319 is still correct, then this will be a problem, as in my local failure case it is not 64k aligned:

request: 0x20015511007070000 aligned: 0x20015511007000000

raggi avatar Jun 24 '22 23:06 raggi

I think that this will be fixed when https://github.com/golang/go/issues/35006 will be closed and the tsan binaries will be recompiled (and updated to v3 like for linux)

neclepsio avatar Jun 25 '22 06:06 neclepsio

If you are building Go from source, please try the new race detector runtime (pending submit) to see if it resolves this issue. From your Go repo on windows:

 git fetch https://go.googlesource.com/go refs/changes/97/420197/2 && git checkout FETCH_HEAD

This version of the runtime requires a more up-to-date C compiler version (in particular it requires libsynchronization.a).

thanm avatar Jul 31 '22 06:07 thanm

With no other apparent change in the toolchain, Go 1.19 seems to solve for me.

neclepsio avatar Aug 03 '22 08:08 neclepsio

With no other apparent change in the toolchain, Go 1.19 seems to solve for me.

maybe it was this?

image

even though the changes don't affect windows, perhaps some minor dependent change inadvertently fixed the bug...?

Southclaws avatar Aug 08 '22 12:08 Southclaws

Also can confirm 1.19 gets things in order again.. Believe it was this set of patches from @thanm https://github.com/golang/go/commit/0c7fcf6bd1fd8df2bfae3a482f1261886f6313c1 https://github.com/golang/go/commit/eaf21256545ae04a35fa070763faa6eb2098591d

dcantah avatar Aug 18 '22 22:08 dcantah

Updated to 1.19 and new error...

Build Error: go test -c -o d:\Work\odin\odin\api\src\services\deal\__debug_bin.exe -gcflags all=-N -l -v -race .
# runtime/cgo
In file included from c:\program files\x86_64-w64-mingw32-native\lib\gcc\x86_64-w64-mingw32\11.2.1\include-fixed\limits.h:34,
                 from c:\program files\x86_64-w64-mingw32-native\include\stdlib.h:11,
                 from _cgo_export.c:3:
c:\program files\x86_64-w64-mingw32-native\include\syslimits.h:12:25: error: no include path in which to search for limits.h
   12 | #include_next <limits.h>
      |                         ^ (exit status 2)

I just want to debug 1 test 😭

Edit: if I run the command manually, apparently it's not even valid:

❯ go test -c -o d:\Work\odin\odin\api\src\services\deal\__debug_bin.exe -gcflags all=-N -l -v -race .
go: unknown flag -l cannot be used with -c

Southclaws avatar Aug 26 '22 11:08 Southclaws

c:\program files\x86_64-w64-mingw32-native\include\syslimits.h:12:25: error: no include path in which to search for limits.h 12 | #include_next <limits.h> | ^ (exit status 2)

I think this is getting a bit farther afield from the original issue ("ThreadSanitizer failed to allocate"). Seems like maybe something is out of whack with your gcc installation.

❯ go test -c -o d:\Work\odin\odin\api\src\services\deal__debug_bin.exe -gcflags all=-N -l -v -race . go: unknown flag -l cannot be used with -c

That looks as though the "-l" is being interpreted by the Go command and not passed to the compiler. I would try fixing up your quoting, e.g. "-gcflags all=-N -l" etc.

thanm avatar Aug 26 '22 14:08 thanm

Seems like maybe something is out of whack with your gcc installation.

I forgot this even used gcc haha I'm not sure which version I have installed, I always assumed Go on Windows used the native Microsoft compiler for C code.

I would try fixing up your quoting

none of these are commands I've written, these are just what comes out when I click "debug test" in vscode

I'll experiment a bit more next week, this was working fine for literal years and suddenly one day it all breaks and I can't debug any more, so annoying.

Southclaws avatar Aug 27 '22 08:08 Southclaws