Gull icon indicating copy to clipboard operation
Gull copied to clipboard

New versions not working on Haswell CPUs

Open ghost opened this issue 9 years ago • 12 comments
trafficstars

Just a follow up from what we discussed in TCEC chat.

BMI2 and POPCNT builds work fine on Broadwell, but not Haswell CPU. POPCNT working fine on Ivy Bridge.

Windows 10 on both Broadwell/Haswell machines. Broadwell is a laptop CPU, Haswell is desktop.

ghost avatar Oct 27 '16 02:10 ghost

Thanks for the report. Just to clarify: is LazyGull.exe working on Windows 10? or not at all on Windows 10?

basil00 avatar Oct 27 '16 11:10 basil00

Lazygull.exe works on the W10 Ivy Bridge machine.

ghost avatar Oct 27 '16 15:10 ghost

The problem appears to be a memory access error:

Version=1 EventType=APPCRASH EventTime=131573322253748933 ReportType=2 Consent=1 UploadTime=131573322254940427 ReportStatus=268435456 ReportIdentifier=16d33475-23c7-425f-99d0-85ef66171330 IntegratorReportIdentifier=c0484487-674e-477a-b1f7-2c430e5e727e Wow64Host=34404 NsAppName=lazygull.exe AppSessionGuid=00004edc-0001-0003-e9db-864e3d71d301 TargetAppId=W:0006367dae34e26b8080cd105058a490da2c0000ffff!00009866fab4679673fdc91f5962e9bdf792cdde5a8d!lazygull.exe TargetAppVer=1970//01//01:00:00:00!3faf6!lazygull.exe BootId=4294967295 TargetAsId=9017 Response.BucketId=13b31e7fd06acf8d249545d98205fb8d Response.BucketTable=4 Response.LegacyBucketId=1483168452780096397 Response.type=4 Sig[0].Name=Application Name Sig[0].Value=lazygull.exe Sig[1].Name=Application Version Sig[1].Value=0.0.0.0 Sig[2].Name=Application Timestamp Sig[2].Value=00000000 Sig[3].Name=Fault Module Name Sig[3].Value=msvcrt.dll Sig[4].Name=Fault Module Version Sig[4].Value=7.0.16299.15 Sig[5].Name=Fault Module Timestamp Sig[5].Value=20688290 Sig[6].Name=Exception Code Sig[6].Value=c0000005 Sig[7].Name=Exception Offset Sig[7].Value=000000000005bb10

ghost avatar Dec 22 '17 23:12 ghost

Yep, that's not good. Might be hard to track this down, and I don't work on this project anymore. Is there a way to reliably reproduce the crash?

basil00 avatar Dec 24 '17 09:12 basil00

It crashes as soon as I run the exe. So very reliably. I have had different people run the exe and it doesn't crash for everyone. But it crashes every time on the TCEC server and my workstation. Other people have mentioned it will crash if they set hash over some threshold (maybe 4GB?) but I haven't been able to test that.

I know you aren't working on this anymore, but if you can get this fixed soon, I will put this version into TCEC season 11.

ghost avatar Dec 24 '17 18:12 ghost

Hmmmm, OK let me try a few tests on different machines.

basil00 avatar Dec 24 '17 21:12 basil00

I wasn't able to reproduce the crash so far. I can try some different machines next week.

basil00 avatar Dec 26 '17 20:12 basil00

And it coredump(segfault) on Core i7-980X on Windows 10 I can to create crashdumps with -g option https://helgeklein.com/support/creating-an-application-crash-dump/

d3vv avatar Mar 02 '18 12:03 d3vv

I was never able to reproduce the bug. But I think it is likely because of the funky memory mapping idea I tried to port from Linux to Windows. Basically:

  • Global data is shared between processes using globals declared here.
  • The globals are just fixed addresses specified in the Makefile here.
  • The actual data is a shared mapping created here.

Doing it this way speeds up the code slightly by removing one level of indirection. Not sure if it translates to any measurable difference in playing strength...

Probably, the constant addresses conflict with other objects for certain CPUs, causing the crash. That is my guess, it could be some completely unrelated issue...

basil00 avatar Mar 03 '18 02:03 basil00

Yes, it is all about of memory map files:

$ ./LazyGull.exe

This application has requested the Runtime to terminate it in an unusual way. Please contact the application's support team for more information. LazyGull (180302-Windows-x86_64) error: failed to remove object "Local\LazyGull_19280_INFO_10"

d3vv avatar Mar 03 '18 19:03 d3vv

I have a one question how you calc those addresses?

-Wl,--defsym=INFO=0x51010000 \
-Wl,--defsym=SETTINGS=0x51000000 \
-Wl,--defsym=SHARED=0x51020000 \
-Wl,--defsym=DATA=0x50000000 \
-Wl,--defsym=PAWNHASH=0x54000000 \
-Wl,--defsym=PVHASH=0x58000000

Why it differ from MacOS and Linux?

And Why bellow:

#ifndef WINDOWS extern GEntry HASH[]; #else #define HASH ((GEntry *)0x8000000) #endif

d3vv avatar Mar 03 '18 19:03 d3vv

I can't remember why, but my guess is that there was a linker bug.

To "fix" the problem, it should not be too difficult to convert back to using global pointers, e.g.:

GThreadInfo *INFO;
GSettings   *SETTINGS;
GSharedInfo *SHARED;
...etc.

Then use non-fixed addresses when creating the shared mappings.

basil00 avatar Mar 05 '18 00:03 basil00