ghidra
ghidra copied to clipboard
Fix `LOCK`, `XACQUIRE`, `XRELEASE` instruction prefix for x86
Edit3: This PR should be ready for merging/internal edits and testing.
Edit2: This PR is mostly complete with respect to decoding. See this comment for latest update
Edit: This PR now also includes changes for XACQUIRE
and XRELEASE
prefixes because they are closely related to the LOCK
prefix. (See here for details on that implementation https://github.com/NationalSecurityAgency/ghidra/pull/4336#issuecomment-1157664283)
I am looking for feedback (and suggestions for bugfixes) on the current implementation before fixing any other instructions and submitting for a real PR.
Summary:
This commit is a proof-of-concept (POC) implementation to fix how sleigh
interprets the LOCK
prefix on x86 architecture. Previously, the sleigh
specification interpreted LOCK
as a standalone instruction; however,
that is wrong. The following is a description of a possible
implementation to fix this. This implementation only fixes the INC
and
XADD
instructions to demonstrate and hopefully determine whether this
implementation is on the right track or if a better implementation
exists that we can use.
The "LOCK" disassembly text appears after the mnemonic, which is
similar to the REP
prefixes. I do not know if making "LOCK" appear
before a mnemonic like LOCK INC dword ptr[EAX]
is possible or even
preferable.
The Intel manual states the following for the LOCK
prefix:
The LOCK prefix can be prepended only to the following instructions and only to those forms of the instructions where the destination operand is a memory operand: ADD, ADC, AND, BTC, BTR, BTS, CMPXCHG, CMPXCH8B, CMPXCHG16B, DEC, INC, NEG, NOT, OR, SBB, SUB, XOR, XADD, and XCHG. If the LOCK prefix is used with one of these instructions and the source operand is a memory operand, an undefined opcode exception (#UD) may be generated. An undefined opcode exception will also be generated if the LOCK prefix is used with any instruction not in the above list. The XCHG instruction always asserts the LOCK# signal regardless of the presence or absence of the LOCK prefix
Implementation:
Add lockprefx
to contextreg
to track the appearance of the prefix
and identify the LOCK
prefix bit pattern during instruction decoding.
It can appear in any order along with the other prefixes. I increased
the size of the context register because I do not understand the
interaction of the existing fields, and it looks like there are no more
free bits. I do not know whether increasing the context register size
from 4 to 8 has any adverse or unforeseen effects elsewhere.
Explanation, suggestions, or confirmation about this change would be
greatly appreciated!
Use the with
statement to protect against invalid LOCK
prefixes on
all instructions that cannot use the LOCK
prefix. Instructions that
can take the LOCK
prefix (only INC
and XADD
in this POC) should
close the with
statement (closing }
bracket) and handle the LOCK
variants explicitly. The with
statement is also added in the top-level
'*.slaspec' files to surround the @include
files for other
instructions not in ia.sinc
and prevent incorrect disassembly. I
discovered this technique by noticing a similar structure in the
68000.sinc
file with extGUARD
.
Handle INC
and XADD
with LOCK
prefix decoding only with a memory
destination operand. I struggled a bit with this, but I think what I've
done is okay? I had trouble getting anything else to compile and
function correctly. Still, I attempted to split out the memory and
register destination operands to only apply LOCK
to the memory
destination operands. Suggestions on whether a better alternative
implementation exists are appreciated!
... admittedly, I only tested the disassembly result and did not inspect
the generated pcode with the LOCK
prefix. I'm hoping I correctly
understood how the pcode is built and that everything lines up
correctly. The LOCK()
pcodeop should be first for each instruction
followed by the normal semantics.
Addresses the bug report about LOCK
in https://github.com/NationalSecurityAgency/ghidra/issues/2033
The instruction bytes I've tested are the following. S_SUCCESS
means Ghidra should correctly decode to the following disassembly text and S_FAILURE
means Ghidra was correctly unable to disassemble the bytes.
// ----------- INC
// Normal disassembly with LOCK valid instruction and register destination
{{0xff, 0xc0}, S_SUCCESS, "INC EAX"},
// INC.LOCK EAX not valid because it is not a memory destination
{{0xf0, 0xff, 0xc0}, S_FAILURE, ""},
// Normal disassembly with LOCK valid instruction and memory destination
{{0xff, 0x00}, S_SUCCESS, "INC dword ptr [RAX]"},
// INC.LOCK dword ptr [RAX] is valid because memory destination
{{0xf0, 0xff, 0x00}, S_SUCCESS, "INC.LOCK dword ptr [RAX]"},
// Normal disassembly with LOCK valid instruction and sized memory
// destination
{{0x66, 0xff, 0x00}, S_SUCCESS, "INC word ptr [RAX]"},
// INC.LOCK word ptr [RAX] is valid because memory destination
{{0xf0, 0x66, 0xff, 0x00}, S_SUCCESS, "INC.LOCK word ptr [RAX]"},
// Move the prefix around
{{0x66, 0xf0, 0xff, 0x00}, S_SUCCESS, "INC.LOCK word ptr [RAX]"},
// Normal disassembly with LOCK valid instruction and sized memory
// destination and sized register
{{0x66, 0x67, 0xff, 0x00}, S_SUCCESS, "INC word ptr [EAX]"},
// INC.LOCK word ptr [EAX] is valid because memory destination
{{0xf0, 0x66, 0x67, 0xff, 0x00}, S_SUCCESS, "INC.LOCK word ptr [EAX]"},
// Move the prefix around
{{0x66, 0xf0, 0x67, 0xff, 0x00}, S_SUCCESS, "INC.LOCK word ptr [EAX]"},
// Move the prefix around
{{0x66, 0x67, 0xf0, 0xff, 0x00}, S_SUCCESS, "INC.LOCK word ptr [EAX]"},
// ----------- XADD
// Normal disassembly with LOCK valid instruction and register destination
{{0x0f, 0xc1, 0xd8}, S_SUCCESS, "XADD EAX,EBX"},
// XADD.LOCK EAX,EBX not valid because it is not a memory destination
{{0xf0, 0x0f, 0xc1, 0xd8}, S_FAILURE, ""},
// Normal disassembly with LOCK valid instruction and memory destination
{{0x0f, 0xc1, 0x18}, S_SUCCESS, "XADD dword ptr [RAX],EBX"},
// XADD.LOCK dword ptr [RAX],EBX is valid because memory destination
{{0xf0, 0x0f, 0xc1, 0x18}, S_SUCCESS, "XADD.LOCK dword ptr [RAX],EBX"},
// ----------- LOCK fails
// Test normal invalid LOCK instruction
{{0xB8, 0x01, 0x00, 0x00, 0x00}, S_SUCCESS, "MOV EAX,0x1"},
// MOV.LOCK EAX, 0x1 is not valid because MOV can't have a LOCK prefix
{{0xf0, 0xb8, 0x01, 0x00, 0x00, 0x00}, S_FAILURE, ""},
// Test AVX invalid LOCK instruction
{{0xc5, 0xf9, 0x6f, 0xc1}, S_SUCCESS, "VMOVDQA XMM0, XMM1"},
// VMOVDQA.LOCK XMM0, XMM1 is not valid because VMOVDQA can't have a LOCK prefix
{{0xf0, 0xc5, 0xf9, 0x6f, 0xc1}, S_FAILURE, ""},
One random point of interest: I've observed some glibcs selectively jump after the lock
prefix on a lock cmpxchg
, i.e. some paths go prefixed while others go to the unprefixed instruction.
I've also been looking into the XACQUIRE
and XRELEASE
prefixes, which are related to LOCK
. They require new pcodeops and context register fields.
I am working on that implementation, which will look similar to this, and will make a new commit to this branch when ready.
Again, I'm only going to make the fix for a few instructions until I learn that this implementation is on the right track or I should be doing something different.
After applying there are only BAD-Instruction on previously analyzed binaries. On the other side binaries analyzed with applied patch have problems if opened without patch. Is there a way to fix that?
@jobermayr You would want to increment the minor version number in the ldefs file for the language. This definitely requires an upgrade. I think major version updates are mostly only needed if register definitions are moved around (this PR adds to the context register but doesn't move anything).
I've pushed a commit for how I think XACQUIRE
and XRELEASE
prefix
should be handled, but in my first attempt, I cannot figure out how to
disassemble XCHG
correctly for all of the cases. From my commit
message, copied here:
XACQUIRE and XRELEASE are related to the LOCK instructions so this proof of concept fix attempts to reuse the LOCK prefix logic to combine the checks.
I need help with one thing that I could not figure out:
For XCHG, you are allowed to have an XACQUIRE or XRELEASE prefix without LOCK, and I thought that by listing both prefix constructors that sleigh would choose those constructors when also faced with a memory operand (where the prefixes are valid only), but this did not work and I had to explicitly state that all three XACQUIRE, XRELEASE, and LOCK prefixes were zero even though both XACQUIRE and XRELEASE constructors handle that case. However, this is wrong because the byte for XACQUIRE and XRELEASE is still valid when there is no memory destination, but removing this breaks disassembly for the memory destinations. The following are my expectations of disassembly for x86-64:
* "87d9" - "XCHG ECX,EBX"
* "f287d9" - "XCHG ECX,EBX" # Current implementation fails
* "f387d9" - "XCHG ECX,EBX" # Current implementation fails
* "f087d9" - ""
* "f2f087d9" - ""
* "f0f287d9" - ""
* "f2f387d9" - "XCHG ECX,EBX" # Current implementation fails
* "8719" - "XCHG dword ptr [RCX],EBX"
* "f08719" - "XCHG.LOCK dword ptr [RCX],EBX"
* "f28719" - "XCHG.XACQUIRE dword ptr [RCX],EBX"
* "f38719" - "XCHG.XRELEASE dword ptr [RCX],EBX"
* "f0f28719" - "XCHG.XACQUIRE.LOCK dword ptr [RCX],EBX"
* "f2f08719" - "XCHG.XACQUIRE.LOCK dword ptr [RCX],EBX"
* "f3f08719" - "XCHG.XRELEASE.LOCK dword ptr [RCX],EBX"
* "f3f2f08719" - "XCHG.XACQUIRE.LOCK dword ptr [RCX],EBX"
* "f2f3f08719" - "XCHG.XRELEASE.LOCK dword ptr [RCX],EBX"
increment the minor version number in the ldefs file for the language.
Bumped now.
major version updates are mostly only needed if register definitions are moved around
The xacquire and xrelease changes currently reuse some bits in the context register (because it's the same byte pattern as existing fields), so I don't think that means anything has moved around, right?
reuse would not constitute moving around, yes. There are some other potential things that could require a major language update that involve changing the parse tree, but I don't think any of your PR changes would impact that so far.
Is this MR also intended to fix
Bytes: Meaning: Ghidra (also patched):
f0 48 0f b1 11 lock cmpxchg QWORD PTR [rcx],rdx ??; CMPXCHG qword ptr [RCX],RDX
f0 0f b1 0a lock cmpxchg DWORD PTR [rdx],ecx ??; CMPXCHG dword ptr [RDX],ECX
f0 01 01 lock add DWORD PTR [rcx],eax ??; ADD dword ptr [RCX],EAX
f0 01 08 lock add DWORD PTR [rax],ecx ??; ADD dword ptr [RAX],ECX
Is this MR also intended to fix
Bytes: Meaning: Ghidra (also patched): f0 48 0f b1 11 lock cmpxchg QWORD PTR [rcx],rdx ??; CMPXCHG qword ptr [RCX],RDX f0 0f b1 0a lock cmpxchg DWORD PTR [rdx],ecx ??; CMPXCHG dword ptr [RDX],ECX f0 01 01 lock add DWORD PTR [rcx],eax ??; ADD dword ptr [RCX],EAX f0 01 08 lock add DWORD PTR [rax],ecx ??; ADD dword ptr [RAX],ECX
Eventually, yes, but before I start spending more time fixing all LOCK
and XACQUIRE
/XRELEASE
instructions, I'd like to wait and receive some feedback on the current implementation for the instructions that I have started fixing, INC
, XADD
, and XCHG
, first.
Since this is my first time really digging into sleigh, I want to make sure I am not missing something obvious that would make this fix more minimal and/or cleaner.
Mostly, my concerns are around the implementation to enforce the following constraints
- A
LOCK
prefix is invalid on any instruction that is not in the list of validLOCK
instruction (disassembly should fail) - the
LOCK
prefix is only valid for the above instructions with memory destinations -
XACQUIRE
andXRELEASE
normally require aLOCK
prefix, except forXCHG
instruction, where theLOCK
prefix is optional. (This is currently buggy)
These latest commits should allow Ghidra to correctly disassemble the set of instructions that are LOCK
-compatible.
I also just realized I missed a special case for XRELEASE
only:
The XRELEASE prefix hint can only be used with the following instructions (also referred to as XRELEASE-enabled when used with the XRELEASE prefix):
- The “MOV mem, reg” (Opcode 88H/89H) and “MOV mem, imm” (Opcode C6H/C7H) instructions. In these cases, the XRELEASE is recognized without the presence of the LOCK prefix.
While fixing the rest of the instructions and re-reading the ISA docs, I think a correct pcode implementation would include an opening LOCK();
at the beginning and closing UNLOCK();
at the end of the instruction's pcode.
More commits will come soon to fix the above two issues.
OK, I've (finally) had a chance to look this over, think about it a bit, and discuss it internally. Using the with
statement to protect against invalid disassembly is probably the best way to go here.
One suggestion is to pull all of the lockable instructions out of ia.sinc
into a separate file.
Unfortunately, there are a number of related issues in the existing sleigh that should be fixed, particularly when adding support for the XACQUIRE
and XRELEASE
prefixes. Ideally, we would be able to use the existing context fields for the f2
and f3
prefixes. As far as I can tell, the manual only specifies the "last one wins" rule for which prefix applies when there are multiple f2,f3
prefixes in the cases where f2
and f3
are interpreted as XACQUIRE
and XRELEASE
. Experimentally, it seems to be the case that this rule holds for those prefixes in general (i.e., for the REPE
and REPNE
cases as well). It makes me sad to resolve undefined behavior by looking at one example, so there should probably be some more investigation, but it seems like the conditions in ia.sinc
that prevent disassembly when both prefixes occur can be relaxed.
There are a number of variants of lockable instructions with the byte=0x80 | byte=0x82
constraint. The byte=0x82
condition is not valid for 64-bit code.
There's a substantial amount of overlap between x86.slaspec
and x86-64.slaspec
. Most of the content could be moved to x86.slaspec
, which x86-64.slaspec
could then include.
I'm not suggesting that you would need to do all of this for the PR to be accepted; just a heads up that other changes are needed in related code and some amount of coordination is probably needed. I guess you have several options (in no particular order):
- We take the PR (after the lockable instructions are moved to a separate file) but will add to it (and do more testing) before merging. We'll keep the initial commit so you get credit.
- You could forge ahead with
XACQUIRE
andXRELEASE
, taking the above comments into account. - Finally, I suppose you could decide that you never wanted to step in this mess and just convert the PR to an issue report.
Thank you for the comments and suggestions @ghidracadabra!
Using the
with
statement to protect against invalid disassembly is probably the best way to go here.
👍
One suggestion is to pull all of the lockable instructions out of ia.sinc into a separate file.
Good call. Just to confirm, you only want the lockable versions of the instructions in a separate file? For example, using BTS as a small case:
# ia.sinc
@ifdef IA64
:BTS Rmr64,Reg64 ...
@endif
:BTS Reg16,imm8 ...
:BTS Reg32,imm8 ...
@ifdef IA64
:BTS Reg64,imm8 ...
@endif
# lockable.sinc
@ifdef IA64
:BTS^lockx Mem,Reg64 ...
@endif
:BTS^lockx m16,imm8 ...
:BTS^lockx m32,imm8 ...
@ifdef IA64
:BTS^lockx m64,imm8 ...
@endif
And I would move the with : lockprefx=0 {
statement in the *.slaspec
files to include ia.sinc
and exclude the new lockable.sinc
.
Unfortunately, there are a number of related issues in the existing sleigh that should be fixed, particularly when adding support for the
XACQUIRE
andXRELEASE
prefixes. Ideally, we would be able to use the existing context fields for thef2
andf3
prefixes. As far as I can tell, the manual only specifies the "last one wins" rule for which prefix applies when there are multiplef2
,f3
prefixes in the cases wheref2
andf3
are interpreted asXACQUIRE
andXRELEASE
. Experimentally, it seems to be the case that this rule holds for those prefixes in general (i.e., for theREPE
andREPNE
cases as well). It makes me sad to resolve undefined behavior by looking at one example, so there should probably be some more investigation, but it seems like the conditions inia.sinc
that prevent disassembly when both prefixes occur can be relaxed.
Yeah, I searched the manual just now and couldn't find a reference for the "last one wins" rule applying to REPE
and REPNE
cases, but it probably makes sense that it follows the same rule. I can also take a look at this.
There are a number of variants of lockable instructions with the
byte=0x80 | byte=0x82
constraint. Thebyte=0x82
condition is not valid for 64-bit code.
I think I'm missing something. Is this a bug in the existing pcode (in master
branch) or a bug I introduced in this PR? From what I understand, the LOCK
prefix is valid for 32-bit code as well.
There's a substantial amount of overlap between
x86.slaspec
andx86-64.slaspec
. Most of the content could be moved tox86.slaspec
, whichx86-64.slaspec
could then include.
Sounds good. This should probably be a single, separate commit.
As for what I will do next, I can rebase this PR to be a bit more organized with good commit messages, taking into account the suggestions (and clarification answers) above.
I think it's a good idea to continue with completion of the XACQUIRE
and XRELEASE
prefixes because they are pretty closely tied to LOCK
and there isn't too much left.
I can notify you when I'm done with that, and you can do some internal testing of the PR and modify it as you see fit (option 1).
Thanks again!
I think that just having the lockable versions in the separate sinc file is the way to go. I would put comments in ia.sinc
for each instruction stating where the lockable variants can be found (so one comment for ADD
, one for ADC
, etc).
The byte=0x80 | byte=0x82
is a bug in the master
branch that I noticed while reviewing your code. The LOCK
prefix is valid for 32-bit code, but using opcode 0x82
instead of 0x80
is not valid in 64-bit code. Sorry for the confusion; I was just providing this as an example of an existing error in the sleigh in code that this PR touches. I can fix it after you've submitted.
I believe you may need to just have a thought for 16-bit versions of x86 as well. (Tbh, I don't know if what you're proposing affects 16-bit or not, I just know there are quite a few oustanding issues in this regard and don't want it to be made any worse!!!)
@ghidracadabra I think this is ready for internal review/testing.
I've rebased and split up the changes into a few commits based on some of your suggestions. Feel free to cherry-pick those for faster merge if you'd like. I also combined the LOCK
and XACQUIRE
/XRELEASE
implementation into a single commit because, honestly, it would be a lot of changes and time to split into two commits, so I hope you don't mind.
If there's anything else I can do on my end to make it easier or faster to review, please let me know. Thank you.
Is there an easy way to fix stack after updating from v20220616 to current patch version?
Edit: maybe it is caused by missing Rmr64 constructor (left: unpatched, right: current patch):
@ekilmer To fix an IP issue while building I need:
diff --git a/Ghidra/Processors/x86/certification.manifest b/Ghidra/Processors/x86/certification.manifest index 7d95a8f6a..2e66f4848 100644 --- a/Ghidra/Processors/x86/certification.manifest +++ b/Ghidra/Processors/x86/certification.manifest @@ -11,6 +11,7 @@ data/languages/cet.sinc||GHIDRA||||END| data/languages/clwb.sinc||GHIDRA||||END| data/languages/fma.sinc||GHIDRA||||END| data/languages/ia.sinc||GHIDRA||||END| +data/languages/lockable.sinc||GHIDRA||||END| data/languages/lzcnt.sinc||GHIDRA||||END| data/languages/macros.sinc||GHIDRA||||END| data/languages/mpx.sinc||GHIDRA||||END|
@jobermayr Thank you for the comments! And yes, it looks like I made a very big mistake with regards to splitting out the memory and register destination for lockable instructions. I am still getting familiar with sleigh, so I thought I could replace something like spec_rm8
with Reg8
to get only the register, but this is very wrong because the masks don't align (as shown in your screenshot). Thank you for the review.
I am in the process of correcting this for the register-destination instructions. The lockable versions should be fine, but I will double check.
@jobermayr Please try now. I force-pushed to fix the Rmr
issue.
@jobermayr Please try now. I force-pushed to fix the
Rmr
issue.
It works now. Thanks a lot for doing this.
Would be really nice if the decompiler could one day also simplify the Interlocked functions like IDA:
@ekilmer I'm still reviewing this, but I wanted to mention two things.
- In the semantic section of the
CMPXCHG
variant on line 139 oflockable.sinc
, there is a comment about an error due to duplicatebuild unlock;
statements. You can avoid duplicatebuild
statements by appending<end>
build unlock
; to the end of the section and replacing thegoto inst_next;
withgoto <end>;
. - While testing, I noticed something peculiar. In a few linux libc .so files I examined (both 32 and 64 bit), there is an interesting optimization. The code looks something like this:
74 01
JZ +1
f0 4c 0f b1 29
CMPXCHG.LOCK qword ptr [RCX], R13
Basically, the code checks a value to determine whether it wants the locked variant of an instruction. If it wants the unlocked variant, it jumps 1 byte over thef0
prefix. With this PR, such code results in a disassembly error in the listing since you can't have overlapping instructions. Inmaster
this error does not occur sincef0
is considered a one-byte instruction. You can see this in your branch by importing and analyzing a libc so from linux, opening the "Bookmarks" menu, and typing "conflicting" into the filter.
I experimented with a fix using a delayslot
directive, but as far as I can tell the manual does not forbid repeating the lock
prefix and nested delay slots result in a disassembly error. This isn't a reason to reject the PR, but I figured I'd mention it in case it's of interest to you.
Edit: It might be possible to avoid the nested delay slots error by introducing some context; not sure whether it's worth it though.
@ekilmer I'm still reviewing this, but I wanted to mention two things.
- In the semantic section of the
CMPXCHG
variant on line 139 oflockable.sinc
, there is a comment about an error due to duplicatebuild unlock;
statements. You can avoid duplicatebuild
statements by appending<end>
build unlock
; to the end of the section and replacing thegoto inst_next;
withgoto <end>;
.
Thank you! Fixed in latest force-push.
- While testing, I noticed something peculiar. In a few linux libc .so files I examined (both 32 and 64 bit), there is an interesting optimization. The code looks something like this:
74 01
JZ +1
f0 4c 0f b1 29
CMPXCHG.LOCK qword ptr [RCX], R13
Basically, the code checks a value to determine whether it wants the locked variant of an instruction. If it wants the unlocked variant, it jumps 1 byte over thef0
prefix. With this PR, such code results in a disassembly error in the listing since you can't have overlapping instructions. Inmaster
this error does not occur sincef0
is considered a one-byte instruction. You can see this in your branch by importing and analyzing a libc so from linux, opening the "Bookmarks" menu, and typing "conflicting" into the filter.I experimented with a fix using a
delayslot
directive, but as far as I can tell the manual does not forbid repeating thelock
prefix and nested delay slots result in a disassembly error. This isn't a reason to reject the PR, but I figured I'd mention it in case it's of interest to you.Edit: It might be possible to avoid the nested delay slots error by introducing some context; not sure whether it's worth it though.
Ahhh, yes, @pgoodman mentioned that in an earlier comment too 😬 https://github.com/NationalSecurityAgency/ghidra/pull/4336#issuecomment-1153251230.
Ok, I've finished reviewing this. Overall looks good, thank you for taking this on!
There are a few minor issues to address before we can accept this:
- Although certain
MOV
instructions can take anXRELEASE
prefix, they do not accept aLOCK
prefix. For example,f0 48 89 18
is an invalid instruction and should not disassemble. So theMOV
instructions should probably be moved back toia.sinc
. - Possibly related to this, but there are some TODO statements about identical patterns (e.g. line 3038 of
ia.sinc
.). Were we missing variants? Is this an consequence of moving some of theMOV
variants tolockable.sinc
? -
XCHG
with a memory operand asserts a lock signal whether or not there is aLOCK
prefix, so the pcode section should just have calls toLOCK
andUNLOCK
rather than build directives.
There are some other possible improvements, such as better handling of the one-byte jump and having the "LOCK" appear as a prefix in the disassembly. These touch areas of the code other than those relevant to this PR and can be considered after this is merged.
Edit: I see your above comment related to #2. I'll investigate and make any necessary fixes after you've submitted.
Ok, I've finished reviewing this. Overall looks good, thank you for taking this on!
You're welcome! And thank you again for the review(s).
There are a few minor issues to address before we can accept this:
- Although certain
MOV
instructions can take anXRELEASE
prefix, they do not accept aLOCK
prefix. For example,f0 48 89 18
is an invalid instruction and should not disassemble. So theMOV
instructions should probably be moved back toia.sinc
.
Fixed. Force-pushed.
- Possibly related to this, but there are some TODO statements about identical patterns (e.g. line 3038 of
ia.sinc
.). Were we missing variants? Is this an consequence of moving some of theMOV
variants tolockable.sinc
?
It was a consequence of splitting the original pattern into one that only accepts memory destination operand (so that I could look for a valid XRELEASE prefix) and one that only supports a register destination operand.
In the original pattern,
:MOV spec_rm8,imm8 is vexMode=0 & byte=0xc6; (spec_rm8 & reg_opcode=0 ...); imm8 { spec_rm8 = imm8; }
I wasn't certain what reg_opcode=0
actually meant, so I was just following my process of replacing spec_rm8
with spec_m8
(for only capturing memory operands)
MOV^xrelease spec_m8,imm8 is vexMode=0 & xrelease & byte=0xc6; spec_m8 & reg_opcode=0 ...; imm8 { build xrelease; spec_m8 = imm8; }
and then, for the register destination, replacing spec_rm8
with Rmr8
and adding mod=3
:MOV Rmr8,imm8 is vexMode=0 & byte=0xc6; (mod=3 & Rmr8 & reg_opcode=0); imm8 { Rmr8 = imm8; }
which caused the identical pattern error with an existing pattern that looks very similar but with a CRmr8
instead of Rmr8
MOV CRmr8,imm8 is vexMode=0 & byte=0xc6; (CRmr8 & mod=3 & reg_opcode=0); imm8 { CRmr8 = imm8; }
So, I don't fully understand some of the pattern matching going on here, and thought it'd be good to call out the error 😕.
XCHG
with a memory operand asserts a lock signal whether or not there is aLOCK
prefix, so the pcode section should just have calls toLOCK
andUNLOCK
rather than build directives.
Good catch! Fixed. Force-pushed.
There are some other possible improvements, such as better handling of the one-byte jump and having the "LOCK" appear as a prefix in the disassembly. These touch areas of the code other than those relevant to this PR and can be considered after this is merged.
Edit: I see your above comment related to #2. I'll investigate and make any necessary fixes after you've submitted.
Sounds good!
@ekilmer Can you please adapt it to 71dcf8e ?
@jobermayr Rebased and pushed
From https://github.com/NationalSecurityAgency/ghidra/pull/4336#issuecomment-1244435811
Would be really nice if the decompiler could one day also simplify the Interlocked functions like IDA:
diff --git a/Ghidra/Processors/x86/data/languages/ia.sinc b/Ghidra/Processors/x86/data/languages/ia.sinc
index a7017dec1..c457f5248 100644
--- a/Ghidra/Processors/x86/data/languages/ia.sinc
+++ b/Ghidra/Processors/x86/data/languages/ia.sinc
@@ -1090,11 +1090,11 @@ xrelease: ".XRELEASE" is xacquireprefx=0 & xreleaseprefx=1 { XRELEASE(); }
xrelease: is epsilon { }
# LOCK prefix
-lock: ".LOCK" is lockprefx=1 { LOCK(); }
+lock: ".LOCK" is lockprefx=1 { }
lock: is epsilon { }
-unlock: is lockprefx=1 { UNLOCK(); }
+unlock: is lockprefx=1 { }
unlock: is epsilon { }
-lockx: xlockprefx^".LOCK" is lockprefx=1 & xlockprefx { build xlockprefx; LOCK(); }
+lockx: xlockprefx^".LOCK" is lockprefx=1 & xlockprefx { build xlockprefx; }
lockx: is epsilon { }
# Some macros
diff --git a/Ghidra/Processors/x86/data/languages/lockable.sinc b/Ghidra/Processors/x86/data/languages/lockable.sinc
index d62535415..0921f24e0 100644
--- a/Ghidra/Processors/x86/data/languages/lockable.sinc
+++ b/Ghidra/Processors/x86/data/languages/lockable.sinc
@@ -127,6 +127,7 @@
:BTS^lockx^unlock m64,imm8 is $(LONGMODE_ON) & vexMode=0 & lockx & unlock & opsize=2 & byte=0xf; byte=0xba; m64 & reg_opcode=5 ...; imm8 { build lockx; local bit=imm8&0x3f; local val=(m64>>bit)&1; m64=m64 | (1<<bit); CF=(val!=0); build unlock; }
@endif
+define pcodeop _InterlockedCompareExchange;
:CMPXCHG^lockx^unlock m8,Reg8 is vexMode=0 & lockx & unlock & byte=0xf; byte=0xa6; m8 & Reg8 ... { build lockx; _cmpxchg(m8, Reg8); build unlock; }
:CMPXCHG^lockx^unlock m16,Reg16 is vexMode=0 & lockx & unlock & byte=0xf; byte=0xa7; m16 & Reg16 ... { build lockx; _cmpxchg(m16, Reg16); build unlock; }
:CMPXCHG^lockx^unlock m8,Reg8 is vexMode=0 & lockx & unlock & byte=0xf; byte=0xb0; m8 & Reg8 ... { build lockx; subflags(AL,m8); local tmp=AL-m8; resultflags(tmp);
@@ -139,28 +140,11 @@
build unlock; }
:CMPXCHG^lockx^unlock m32,Reg32 is vexMode=0 & lockx & unlock & opsize=1 & byte=0xf; byte=0xb1; m32 & Reg32 ... & check_EAX_dest ...
{
- build lockx;
- #this instruction writes to either EAX or m32
- #in 64-bit mode, a 32-bit register that is written to
- #(and only the register that is written to)
- #must be zero-extended to 64 bits
- subflags(EAX,m32);
- local tmp=EAX-m32;
- resultflags(tmp);
- if (ZF==1) goto <equal>;
- EAX = m32;
- build check_EAX_dest;
- goto <inst_end>;
-<equal>
- m32 = Reg32;
-<inst_end>
- build unlock;
+ EAX = _InterlockedCompareExchange(m32, Reg32, EAX);
}
@ifdef IA64
-:CMPXCHG^lockx^unlock m64,Reg64 is $(LONGMODE_ON) & vexMode=0 & lockx & unlock & opsize=2 & byte=0xf; byte=0xb1; m64 & Reg64 ... { build lockx; subflags(RAX,m64); local tmp=RAX-m64; resultflags(tmp);
- local diff = m64^Reg64; m64 = m64 ^ (zext(ZF) * diff);
- diff = RAX ^ m64; RAX = RAX ^ (zext(ZF==0) * diff);
- build unlock; }
+define pcodeop _InterlockedCompareExchange64;
+:CMPXCHG^lockx^unlock m64,Reg64 is $(LONGMODE_ON) & vexMode=0 & lockx & unlock & opsize=2 & byte=0xf; byte=0xb1; m64 & Reg64 ... { RAX = _InterlockedCompareExchange64(m64, Reg64, RAX); }
@endif
:CMPXCHG8B&lockx m64 is vexMode=0 & lockx & unlock & byte=0xf; byte=0xc7; ( mod != 0b11 & reg_opcode=1 ) ... & m64 {