Gull
Gull copied to clipboard
New versions not working on Haswell CPUs
Just a follow up from what we discussed in TCEC chat.
BMI2 and POPCNT builds work fine on Broadwell, but not Haswell CPU. POPCNT working fine on Ivy Bridge.
Windows 10 on both Broadwell/Haswell machines. Broadwell is a laptop CPU, Haswell is desktop.
Thanks for the report. Just to clarify: is LazyGull.exe working on Windows 10? or not at all on Windows 10?
Lazygull.exe works on the W10 Ivy Bridge machine.
The problem appears to be a memory access error:
Version=1 EventType=APPCRASH EventTime=131573322253748933 ReportType=2 Consent=1 UploadTime=131573322254940427 ReportStatus=268435456 ReportIdentifier=16d33475-23c7-425f-99d0-85ef66171330 IntegratorReportIdentifier=c0484487-674e-477a-b1f7-2c430e5e727e Wow64Host=34404 NsAppName=lazygull.exe AppSessionGuid=00004edc-0001-0003-e9db-864e3d71d301 TargetAppId=W:0006367dae34e26b8080cd105058a490da2c0000ffff!00009866fab4679673fdc91f5962e9bdf792cdde5a8d!lazygull.exe TargetAppVer=1970//01//01:00:00:00!3faf6!lazygull.exe BootId=4294967295 TargetAsId=9017 Response.BucketId=13b31e7fd06acf8d249545d98205fb8d Response.BucketTable=4 Response.LegacyBucketId=1483168452780096397 Response.type=4 Sig[0].Name=Application Name Sig[0].Value=lazygull.exe Sig[1].Name=Application Version Sig[1].Value=0.0.0.0 Sig[2].Name=Application Timestamp Sig[2].Value=00000000 Sig[3].Name=Fault Module Name Sig[3].Value=msvcrt.dll Sig[4].Name=Fault Module Version Sig[4].Value=7.0.16299.15 Sig[5].Name=Fault Module Timestamp Sig[5].Value=20688290 Sig[6].Name=Exception Code Sig[6].Value=c0000005 Sig[7].Name=Exception Offset Sig[7].Value=000000000005bb10
Yep, that's not good. Might be hard to track this down, and I don't work on this project anymore. Is there a way to reliably reproduce the crash?
It crashes as soon as I run the exe. So very reliably. I have had different people run the exe and it doesn't crash for everyone. But it crashes every time on the TCEC server and my workstation. Other people have mentioned it will crash if they set hash over some threshold (maybe 4GB?) but I haven't been able to test that.
I know you aren't working on this anymore, but if you can get this fixed soon, I will put this version into TCEC season 11.
Hmmmm, OK let me try a few tests on different machines.
I wasn't able to reproduce the crash so far. I can try some different machines next week.
And it coredump(segfault) on Core i7-980X on Windows 10 I can to create crashdumps with -g option https://helgeklein.com/support/creating-an-application-crash-dump/
I was never able to reproduce the bug. But I think it is likely because of the funky memory mapping idea I tried to port from Linux to Windows. Basically:
- Global data is shared between processes using globals declared here.
- The globals are just fixed addresses specified in the Makefile here.
- The actual data is a shared mapping created here.
Doing it this way speeds up the code slightly by removing one level of indirection. Not sure if it translates to any measurable difference in playing strength...
Probably, the constant addresses conflict with other objects for certain CPUs, causing the crash. That is my guess, it could be some completely unrelated issue...
Yes, it is all about of memory map files:
$ ./LazyGull.exe
This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. LazyGull (180302-Windows-x86_64) error: failed to remove object "Local\LazyGull_19280_INFO_10"
I have a one question how you calc those addresses?
-Wl,--defsym=INFO=0x51010000 \
-Wl,--defsym=SETTINGS=0x51000000 \
-Wl,--defsym=SHARED=0x51020000 \
-Wl,--defsym=DATA=0x50000000 \
-Wl,--defsym=PAWNHASH=0x54000000 \
-Wl,--defsym=PVHASH=0x58000000
Why it differ from MacOS and Linux?
And Why bellow:
#ifndef WINDOWS extern GEntry HASH[]; #else #define HASH ((GEntry *)0x8000000) #endif
I can't remember why, but my guess is that there was a linker bug.
To "fix" the problem, it should not be too difficult to convert back to using global pointers, e.g.:
GThreadInfo *INFO;
GSettings *SETTINGS;
GSharedInfo *SHARED;
...etc.
Then use non-fixed addresses when creating the shared mappings.