devilution
devilution copied to clipboard
Figure out the exact set of compiler flags used to produce Diablo.exe (1.09b)
This issue tracks the effort to figure out the complier flags used to produce Diablo.exe (version 1.09b). It may be considered a subtask of #11, as given information about compiler flags, we can ensure that the same input source code produce the same output object code.
From what I can tell, it seems like /O1
is used rather than /O2
. This is based on the padding between functions.
/O2
produces (given that https://github.com/diasurgical/devilution/pull/110 has been merged):
401000: e9 0b 00 00 00 jmp 0x401010
401005: 90 nop
401006: 90 nop
401007: 90 nop
401008: 90 nop
401009: 90 nop
40100a: 90 nop
40100b: 90 nop
40100c: 90 nop
40100d: 90 nop
40100e: 90 nop
40100f: 90 nop
401010: a1 28 b1 40 00 mov eax,ds:0x40b128
401015: a3 84 1a 41 00 mov ds:0x411a84,eax
40101a: c3 ret
40101b: 90 nop
40101c: 90 nop
40101d: 90 nop
40101e: 90 nop
40101f: 90 nop
401020: 8b 44 24 04 mov eax,DWORD PTR [esp+0x4]
401024: 85 c0 test eax,eax
401026: 74 0f je 0x401037
401028: 6a 00 push 0x0
40102a: 6a ff push 0xffffffff
40102c: 68 40 d0 40 00 push 0x40d040
401031: 50 push eax
401032: e8 69 0c 00 00 call 0x401ca0
401037: c3 ret
While /O1
produces:
401000: e9 00 00 00 00 jmp 0x401005
401005: a1 28 c1 40 00 mov eax,ds:0x40c128
40100a: a3 84 2a 41 00 mov ds:0x412a84,eax
40100f: c3 ret
401010: 83 7c 24 04 00 cmp DWORD PTR [esp+0x4],0x0
401015: 74 12 je 0x401029
401017: 6a 00 push 0x0
401019: 6a ff push 0xffffffff
40101b: 68 40 e0 40 00 push 0x40e040
401020: ff 74 24 10 push DWORD PTR [esp+0x10]
401024: e8 8f 09 00 00 call 0x4019b8
401029: c3 ret
This is to be compared to the original version of Diablo.exe (1.09b):
401000: e9 00 00 00 00 jmp 0x401005
401005: a1 00 94 47 00 mov eax,ds:0x479400
40100a: a3 30 79 4b 00 mov ds:0x4b7930,eax
40100f: c3 ret
401010: 83 7c 24 04 00 cmp DWORD PTR [esp+0x4],0x0
401015: 74 12 je 0x401029
401017: 6a 00 push 0x0
401019: 6a ff push 0xffffffff
40101b: 68 b0 30 48 00 push 0x4830b0
401020: ff 74 24 10 push DWORD PTR [esp+0x10]
401024: e8 8d 87 06 00 call 0x4697b6
401029: c3 ret
It's worth noting that individual compilation units can have different optimization settings (just to keep in mind).
Another comparison with Diablo.exe 1.09b:
- O1: http://www.mergely.com/TCffp23K/
- O2: http://www.mergely.com/VXbxcrZo/
I actually want to highlight in particular the line and byte ptr [ecx], 0
. This is actually an assignment, but it was optimized into an and
, which is only produced under O1.
Under O1:
-
*ptr = 0
becomesand byte ptr [ecx], 0
, even though assignment was used -
*ptr &= 0
becomesand byte ptr [ecx], 0
Under O2:
-
*ptr = 0
becomesmov byte ptr [eax], 0
-
*ptr &= 0
becomesmov byte ptr [eax], 0
, even though AND was used
The project now uses /O1 for both debug and retail builds. I figured that's what was used, since the output size is much closer (760 KB). The main reason I never changed it was in the early days optimizations broke the code entirely. It took a few months of fixing up before it was stable enough to change it.
You may also be able to find out other compiler flags by looking at the binary header (MZ/PE) and seeing what differs. Imports too probably.
Good call, I noticed the original binary has about 2x as many import descriptors. Which is weird, since it there's only a few differences in the KERNEL32 imports. We'll have to look at that sometime. Also, the release build seems to scramble the code a little more, where as the debug build is closer to the original. Setting aside debug sections of course.
It could also be a good idea to "trace" the developers steps. We know the game was originally written in VC 4.20, but then upgraded to 5.10. So perhaps generating the project using 4.20's default settings could get us closer.
So perhaps generating the project using 4.20's default settings could get us closer
Sounds reasonable. Report back the compiler flags used :) I still don't have a Windows box to play with.
Warning: Big writeup ahead! :)
As @mewmew wrote up over here, more often than not, the compiler we currently use (5.0 SP3) generated slightly different instructions than the original 1.09b binary.
The first thing I checked is the linker versions. For that I grabbed every version ever, basically, and confirmed that the linker used must be from 5.0 SP3:
SP0: 5.00.7022
SP1: 5.02.7132
SP2: 5.02.7132
SP3: 5.10.7303
Diablo 1.09b PE Header: 5.10
While browsing for compiler option info, I stumbled over the #pragma optimize
directive, allowing to toggle optimization settings on a per-function basis. Playing around with disabling size-based code and other optimizations, the generated instructions were even further away from the original, so I knew that was probably not what was happening.
For some reason, the compiler used by Blizzard was just more "clever" than the one we use...
After going off-track by checking the #pragma optimize
directive, I went back to linker stuff. Just to make sure, I checked VS6's link.exe
(starting from SP6, for some reason).
The output was:
C:\Program Files\Microsoft Visual Studio\VC98\Bin>link /?
Microsoft (R) Incremental Linker Version 6.00.8447
Copyright (C) Microsoft Corp 1992-1998. All rights reserved.
usage: LINK [options] [files] [@commandfile]
options:
/ALIGN:#
/BASE:{address|@filename,key}
[...]
/LINK50COMPAT
Looking at that options list it instantly clicked. See that /LINK50COMPAT
? Googling it revealed that link.exe
as well as lib.exe
both have that flag in version 6. It seems that with VC++ 6, they changed the structure of lib files, so the lib files generated by lib.exe
5.x and lower are not compatible with link.exe
6.x and vice versa. Microsoft implemented that switch so that older library files can be used with VC++ 6, and code compiled by VC++ 6 can generate lib files compatible with the linker of VC++ 5.
So I just fired up VS6SP6, upgraded the project file, compiled, checked in IDA, and voilà :
(left is Diablo.exe 1.09b, right is devilution compiled/linked with VC++ 6 SP6)
The instructions are identical (at least for the function from PR #135)!
So, what I guess happened:
- With the time, Blizzard upgraded to VS/VC++ 6
- Hovever, something prevented them to use the VC++ 6 linker.
- Thus, they compiled newer versions of Diablo with VC++ 6, generated the
.lib
files from the.obj
files via VC++ 6'slib.exe
with the/LINK50COMPAT
, generating library files compatible with VC++5's linker - Then, they linked those with the linker of 5.0SP3, leading to the PE header containing 5.10 as linker version, but optimizations only possible with a V6 compiler!
- It seems like they literally only used
/O1
:D
The 1.09b .exe has a creation date of 2001-05-12. This means that the newest VC++6 version they could have used it Service Pack 5 (released April 2001). Service Pack 4 (released January 2001) would be a good contender as well.
Anyways, that was my journey today. :)
What's left to do:
- Find out which service pack was used (only needed if we still find significant changes where there shouldn't be any)
- Build a VC++ 6 workspace that does a custom build generating lib files compatible with
link.exe
v5.10, and call that linker to generate the output file.
In retrospective, all of this makes sense seeing that Storm.dll
and the .SNP
files are even linked with v6 of the linker! 🙂
That's way cool! Excellent work.
@seritools Wow that's a great read and find, the asm is exactly the same (excluding different memory locations)!
Beautiful, now we just have to figure out how they automated everything :wink: It seems tedious to have used two different toolchains, so maybe there's another option we're missing or they modded their VC6 installation.
Awhile back I documented the versions they used for D2:
Version Released Patch
MSVC++ 6.0 SP 3 5/21/1999 1.00-1.03
MSVC++ 6.0 SP 4 6/15/2000 1.04-1.05
MSVC++ 6.0 SP 5 2/26/2001 1.06-1.10
MSVC++ 6.0 SP 6 3/29/2004 N/A
VS .NET 2003 ?/??/2003 1.11+
The linker for both SP 4 and 5 generates code exactly the same, I didn't test SP 6 since it was made long after those patches. We can assume that the D2 devs used their current setup to compile D1 at the time, so 1.09b
would've been compiled with SP 5
and 1.08 SP 3
.
@seritools Wow Dennis, that's great! Hats off. Very happy that you managed to pin this down.
Do we get similar sizes with the optimization? I am curious what a bindiff would result in. It might be useless to know, but I think that it would tell us how close we are.
https://github.com/seritools/devilution/commit/6f5f573bbd520db8dcda36db8bb52525b9bcd7f7 (please excuse my crappy makefile skills)
Managed to link the exe with 5.10 while compiling with 6 :) However in this constellation, PDBs cannot be generated, since the linker exits mentioning that the pdb format is not supported. A release build is possible though, and reports the correct linker version.
For working at getting the code accurate I'd recommend just linking with v6 for now (as long as we don't see differences too big to tolerate). Generating release builds with full PDB info makes it a lot easier imho.
What do you think?
A few more observations:
- SP5 is definitely ruled out: The release binary is way too big, and one jump in
CheckSpell
(.text:0045759A jz short loc_4575A0
) is generated as a jnz, with a slightly different jump pattern, and a few other slight differences in the function beginning - SP3: The release binary (linked with VC6) is 744KB, only 4KB bigger than the target, and the jump is generated just like the original one. However, the function beginning is still different:
@galaxyhaxz has noted that the VC6 linker starts the program code at 0x1000, while the VC5 linker starts at 0x400, which is a difference of 3072 bytes, so the resulting binary could be down to 741KB already!
I'm going to check SP2, SP1, SP0 next and try linking with VC5 instead to see if I can get that simple function to generate correctly and maybe generate a perfectly-sized binary :O
EDIT:
- SP2: File size is way bigger at 756KB, ruling that one out. (It is not generating the correct function beginning either.)
- SP0: File size is again down at 744KB. (It is not generating the correct function beginning either.)
- SP1: Same size as SP0.
- SP4: File size is again bigger at 756KB.
Checked the exact byte counts:
1.09: 757760 bytes
SP0: 761856 bytes (diff. 4096 bytes, at 3072 of which are because I currently link with VC6) SP1: 761856 bytes SP3: 761856 bytes
Alright, SP2, SP4, SP5 are excluded now, since all three of those generate binaries that are too big.
For some reason, with SP3, the filesize went down again. I reinstalled SP3 just to make sure, but it stayed the same.
So for now, VC6 SP0, SP1 or SP3 in conjunction with the linker from VC5 SP3 is the closest we can get.
Copied over the SP3 installation to my buildtools folder, then built it while linking with VC5SP3... well, I accidentally beat Blizzard 😆
So, what I gathered:
- Programs compiled usually increase in 1024 byte chunks
- VC6 SP3 + Linker from VC5SP3 seem to save just a few bytes too much, resulting in stripping a whole KB from the executable
Awesome work!
I think I might try this, because I am not thinking the pdb info to be too useful at this point considering we literally can cross reference the binary. 1024 bytes are still a lot though, but it could be worse.
I am curious if we can get closer with pdb info .
Also @seritools for that binary that is really close, can you upload it ? I would like to do a bindiff on it.
PDB info is nice while comparing functions, since you can just jump to the function in question in the new binary, without remembering adresses, and getting all the type info that we already have for the original binary as well. (Generating with pdb info results in a much bigger binary btw)
Since the binaries are rounded up to the next 1024bytes, maybe the original binary was just 739k + a few bytes? Every instruction we missed could be the one dropping us down to 439KB. :)
Like there was/is (already reported to @galaxyhaxz) a missing function NewCursor in devilution, which did nothing but call to SetCursor in the original binary. A couple of bytes "saved". Also every differing optimization contributes to that.
I'm gonna be at my pc tomorrow and will upload the binary then.
And here is the exe :) devilution.zip
Comparing the size with @seritools binary:
.text .rdata .data .rsrc Total
Size change in Devilution: +0x2400 -0x7600 +0x4E00 Identical -0x400
So the Devilution binary actually has 0x2400 (9kb) more in the code section than the vanilla. This is surely due to many decompiled functions having excessive variables. Interesting though is that most data is stored in .data
instead of .rdata
. This is likely because many of these variables were declared as static. Devilution has a total of 0x2800
bytes less in the data section. I noticed the compiled binary has a much smaller CRT section than compiled plainly with VC5, which could attribute to that. It could also be that the compiler is discarding certain global variables or data that isn't used.
Yeah, I only got the size way down when using the VC6 libs, the file was quite a bit bigger with the VC5 ones. (Tbh it was surpriting to me that the VC6 libs were working at all, since it supposedly has a different lib format, but maybe that's just not the case for the standard libs)
.rdata
is bigger in the original binary because the decompiled code is missing const
annotations on the various data tables (arrays). const
gets placed in a separate section.
See issue #13. Many functions and data are declared as static
but are missing that. It seems you're right that const
was probably used as well.
At least information about static
is present in the debug info from the SYM files; e.g. https://github.com/sanctuary/psx/blob/master/easy-as-pie/globals/global_0.cpp#L4522
https://github.com/diasurgical/devil-nightly/issues/15
We're pretty much done 🎉
In the end, Blizzard made sure to keep their software up to date it seems 😄
We can probably be 100% sure now :) Found this by change while browsing reddit:
http://bytepointer.com/articles/the_microsoft_rich_header.htm
Woha that is exiting :D
But does that also say that the linker was VC6 SP3? (probably wasn't updated in the later versions)
We can probably be 100% sure now :) Found this by change while browsing reddit:
That is such a great find! Wow, what a buried treasure!!
Funny, I actually knew about the Rich
header but had forgotten about it (my first written reference to it is from 2007)... well, good ol' brute-force search and a bit of luck still got us there :muscle:
@AJenbo Linker was the same from SP3-SP6. However SP4/SP5 can generate different code if the Processor Pack is installed.
@seritools Once again, a great discovery! Oh, it's so great to be working with code from 2001. Could you imagine if Diablo 1 was released now with all those micro-optimizations and stripping the Rich header+file names? :P
After comparing the Rich header with 3 different tools, below is a list documenting each value in the header. Interesting is that "Utc12_2_C" is a C compiler and not C++. So it's possible all the files were treated as C even with .CPP extensions. Also, there doesn't appear to be any asm/MASM references, which I guess implies render.cpp
was indeed a .CPP file with inlined calls.
Id Build Count Name Description
0 0 155 Unknown [---] Number of imported functions (old)
1 0 229 Import0 [---] Number of imported functions
6 1668 1 Cvtres500 [RES] VS97 (5.0) SP3 cvtres 5.00.1668
2 7303 29 Linker510 [IMP] VS97 (5.0) SP3 link 5.10.7303
3 7303 1 Cvtomf510 VS97 (5.0) SP3 cvtomf 5.10.7303
4 8447 2 Linker600 [LNK] VC++ 6.0 SP3,SP4,SP5,SP6 link 6.00.8447
48 9044 72 Utc12_2_C [---] VC++ 6.0 SP5 Processor Pack
19 9049 12 Linker512 Microsoft LINK 5.12.9049
So it's possible all the files were treated as C even with .CPP extensions.
That was one of my very first assumptions about the project haha
That was one of my very first assumptions about the project haha
So apparently you want evidence for something that you've known all along?