ghidra icon indicating copy to clipboard operation
ghidra copied to clipboard

ghidra api: what is a difference between read/ write and read_write?

Open Ruturaj4 opened this issue 4 years ago • 28 comments

Using following instruction api - https://ghidra.re/ghidra_docs/api/ghidra/program/model/listing/Instruction.html, I can get instruction operand reference types - like inst.getOperandRefType(0 or 1). This gives me RefType object.

What is a difference between READ and READ_WRITE? - https://ghidra.re/ghidra_docs/api/ghidra/program/model/symbol/RefType.html

I am asking this because I observed that the api sometimes gives me READ or READ_WRITE for similar instructions (say - MOV EAX,dword ptr [RBP + -0x4]) if I do inst.getOperandRefType(0).

Ruturaj4 avatar Dec 30 '20 16:12 Ruturaj4

READ_WRITE corresponds to a read-modify-write reference, where a single instruction reads and writes the same location. I would think the instruction indicated should most likely have a READ stack reference.

ghidra1 avatar Dec 30 '20 22:12 ghidra1

@ghidra1 Thanks for your reply.

No, the indicated instruction has READ_WRITE reference for first operand. And that's what confuses me. Also, I observe read_write reference mostly when an instruction dereferences a location pointed by a pointer.

For e.g. consider following instruction snippet.

401119:        mov    rax,QWORD PTR [rbp-0x8]
40111d:        mov    eax,DWORD PTR [rax+0x28]
401120:        mov    DWORD PTR [rbp-0xc],eax

Here operand 0 (i.e. inst.getOperandRefType(0)) of instruction @ 40111d is marked as READ_WRITE, but operand 0 of instruction @401119 is marked as WRITE (as it is a register write). But, I am not really sure about this.

Ruturaj4 avatar Dec 30 '20 23:12 Ruturaj4

Register eax is the lower 32 bits of rax, so the instruction at 40111d is both reading and writing same register (in addition to reading memory), whereas for the instruction at 401119, rbp and memory are read and rax is written (not read at all).

ghizard avatar Jan 02 '21 20:01 ghizard

@ghizard thanks for your reply. I thought of that. But, I observed this in cases where for e.g. something like - eax,DWORD PTR [rbp-28]. Where op0 is only written but not read.

Ruturaj4 avatar Jan 03 '21 23:01 Ruturaj4

@Ruturaj4 , the later case you mentioned sounds incorrect but without more insight it is difficult to say. You may need to examine the pcode associated with an instruction to examine the low-level reads and writes. The pcode display for the listing can be enabled from the listing field layout panel via pull-down toolbar-icon (small square with down arrow) at top of listing panel. If you could include the pcode screen capture with an instruction you believe is incorrect that would be helpful. Screenshot at 2021-01-04 10-17-57

ghidra1 avatar Jan 04 '21 14:01 ghidra1

I think it also depends upon the origin of the register references. By default I think analysis is pretty conservative about creating register references. My example screenshot above Stack Analysis was enabled so its reference prevailed. At present, register, stack and memory references can not coexist on the same operand.

ghidra1 avatar Jan 04 '21 15:01 ghidra1

Hi there,

I believe I am also encountering this bug. I'm investigating the following instruction:

00402b81 8b 30           MOV        ESI,dword ptr [EAX]

getOperandRefType is returning a RefType of READ_WRITE for ESI, whereas I'd expect it to return WRITE.

Here are the PCodes for the instruction:

2022-10-10-154143_858x47_scrot

I can't see any reason why ESI would be READ. Am I missing something?

This is using Ghidra 10.1.5.

mschwager avatar Oct 11 '22 13:10 mschwager

Screenshot at 2022-10-11 16-42-05 I was able to reproduce for 64-bit x86 although my pcode is different than yours which I assume is 32-bit. In my case it is clear where the confusion comes from due to the presence of INT_ZEXT into the same/larger register. I was unable to reproduce for the 32-bit case where I got the correct RefTypes for each operand.

ghidra1 avatar Oct 11 '22 21:10 ghidra1

@ghidra1 Thanks for your reply.

No, the indicated instruction has READ_WRITE reference for first operand. And that's what confuses me. Also, I observe read_write reference mostly when an instruction dereferences a location pointed by a pointer.

For e.g. consider following instruction snippet.

401119:        mov    rax,QWORD PTR [rbp-0x8]
40111d:        mov    eax,DWORD PTR [rax+0x28]
401120:        mov    DWORD PTR [rbp-0xc],eax

@Ruturaj4 > Here operand 0 (i.e. inst.getOperandRefType(0)) of instruction @ 40111d is marked as READ_WRITE, but operand 0 of instruction @401119 is marked as WRITE (as it is a register write). But, I am not really sure about this.

I believe my last post reproduces the issue you are seeing - I assume your case is 64-bit x86. You would have to display your pcode to observe the INT_ZEXT of EAX into RAX which improperly adds a READ to operand #0 resulting in a READ_WRITE RefType.

ghidra1 avatar Oct 11 '22 21:10 ghidra1

@ghidra1 yes mine is 32-bit x86. Here's the file information if that helps:

2022-10-11-152533_426x316_scrot

Is there anything else I can provide to help reproduce? Unfortunately I cannot provide the binary itself.

mschwager avatar Oct 11 '22 21:10 mschwager

@mschwager I have used your exact instruction bytes for 32-bit x86 and can not reproduce. Based on your pcode display there is also no explanation as to why you would see this. At this point I am only giong to be looking into the 64-bit case initially identified by this ticket since it is reprodcuable and the cause is understood.

ghidra1 avatar Oct 12 '22 14:10 ghidra1

@ghidra1 Hi, same question here. Three MOV instructions, the first and third do make sense. Why should the type of ECX of the second instruction be READ_WRITE?

Three MOV instructions with PCode enabled: image The type of the operands and input and output objects of each instruction: image

xian-wen avatar Oct 13 '22 13:10 xian-wen

Assuming you are using instruction.getOperandRefType(int opIndex) and a 32-bit x86 program, I am unable to reproduce. The instruction.getOperandType(int opIndex) method is a different beast. The getOperandType method returns flag bits which must be interpreted (see OperandType class). Below are the results I get on the instruction in question for these two methods:

Operand #0 refType: WRITE  type: 0x200 (REGISTER)
Operand #1 refType: READ  type: 0x400000 (DYNAMIC)

ghidra1 avatar Oct 13 '22 16:10 ghidra1

I someone can supply a sample Ghidra *.gzf file for the 32-bit case and script which demonstrates the issue I can look into further.

ghidra1 avatar Oct 13 '22 16:10 ghidra1

@ghidra1 Hi, below is what I get:

address: 00402bf0
Operand #0: ECX refType: READ_WRITE type: 0x00000200 (REGISTER)
Operand #1: [EAX] refType: READ type: 0x00400000 (DYNAMIC)

Sorry, currently I am not sure whether I am allowed to upload the *.gzf file. If I got that permission, I will do it immediately.

xian-wen avatar Oct 13 '22 17:10 xian-wen

Can you try something like a 32-bit Notepad.exe and see what happens.

ghidra1 avatar Oct 13 '22 18:10 ghidra1

@ghidra1 Hi, I finally found one case. Could you please try npp.6.2.3.Installer.exe? For import and analysis, please all use the default options (I am running Ghidra 10.1.5 on Win 11, x86_64).

For the MOV instruction at 0x00403e20 in FUN_00403df6, I got similar results, as shown in the pictures below.

MOV instruction with PCode enabled: image The refType and type of EAX: image

xian-wen avatar Oct 14 '22 03:10 xian-wen

@xian-wen I am still stumped for your 32-bit case. I tried the exact sample you indicated with Ghidra 10.1.5, our current patch branch, and our current master branch and they all produce the expected ref-types for the operands. Do you have a vanilla install of Ghidra? Can you generate the hash (e.g., md5 or sha256) of your language SLA file which should match mine (ghidra_10.1.5_PUBLIC/Ghidra/Processors/x86/data/languages/x86.sla). This file contains the processor spec (sleigh compiler output) for x86 would contain the operand flags for the instructions that feed into the ref-type determination. This is the only variable I can think of but should match mine. If the hashes differ from mine could you please attach the file to this ticket (it is an XML text file)?

Expected MD5: 2ba79faa211131dd14686b656708b491
SHA256: 55ee3464fcad4e41931336b0293707f2beef57ed2fff0e659725475855690c60

ghidra1 avatar Oct 14 '22 18:10 ghidra1

@ghidra1 Hi, I got the same hash for x86.sla:

# MD5
2ba79faa211131dd14686b656708b491
# SHA256
55ee3464fcad4e41931336b0293707f2beef57ed2fff0e659725475855690c60

xian-wen avatar Oct 15 '22 01:10 xian-wen

@ghidra1 This problem is really weird and hard to reproduce. After testing again and again, I found that sometimes the refType is READ_WRITE, and sometimes it is WRITE.

xian-wen avatar Oct 15 '22 02:10 xian-wen

Huh, that is weird. Sounds like some kind of race condition. Is there some caching going on with refTypes that's leading to race-y or otherwise non-deterministic behavior?

mschwager avatar Oct 16 '22 18:10 mschwager

This problem is really weird and hard to reproduce. After testing again and again, I found that sometimes the refType is READ_WRITE, and sometimes it is WRITE.

@xian-wen what variations are performed between attempts? (e.g., new or same program, same instruction address, one or two programs open, ...) Reproducing is the trick since it will require debug or taylored logging to track it down.

ghidra1 avatar Oct 18 '22 23:10 ghidra1

@ghidra1 Hi, still the npp.6.2.3.Installer.exe, please check all the instructions below. I do not think they are right.

GenerateDotGraphScript.java> Running...
GenerateDotGraphScript.java> 00407320    MOV EDI,dword ptr [ESI + 0x9bb4]
GenerateDotGraphScript.java> Operand #0: EDI refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407326    MOV EBX,dword ptr [ESI + 0x9bb8]
GenerateDotGraphScript.java> Operand #0: EBX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407330    MOV EBX,dword ptr [ESI + 0x9bb0]
GenerateDotGraphScript.java> Operand #0: EBX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407353    MOV EAX,dword ptr [ESI + 0x9bb0]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407466    MOV EAX,dword ptr [EBP + ECX*0x4 + -0x58]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 004076b2    MOV EDX,dword ptr [EBX + 0x9ba8]
GenerateDotGraphScript.java> Operand #0: EDX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 004076bb    MOV EAX,dword ptr [EBX + 0x51c]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 004076c5    MOV EAX,dword ptr [EBX + 0x9ba4]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 004076cc    MOV EDI,dword ptr [EBX + 0x518]
GenerateDotGraphScript.java> Operand #0: EDI refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 004076e4    MOV EAX,dword ptr [EBX + 0x9ba0]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 004078af    MOV EAX,dword ptr [EBX + 0x514]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 004078d0    MOV ECX,dword ptr [EBX + 0x9ba0]
GenerateDotGraphScript.java> Operand #0: ECX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 004078da    MOV EAX,dword ptr [EBX + 0x9ba4]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407912    MOV ESI,dword ptr [EBX + 0x9ba8]
GenerateDotGraphScript.java> Operand #0: ESI refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407918    MOV ECX,dword ptr [EBX + 0x9ba4]
GenerateDotGraphScript.java> Operand #0: ECX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 0040792c    MOV EAX,dword ptr [EBX + 0x9ba0]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407934    MOV EDX,dword ptr [EBX + 0x9ba0]
GenerateDotGraphScript.java> Operand #0: EDX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407ae5    MOV EAX,dword ptr [EBX + 0x50c]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407b1d    MOV ECX,dword ptr [EBX + 0x510]
GenerateDotGraphScript.java> Operand #0: ECX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407be1    MOV ESI,dword ptr [EBX + ECX*0x4 + 0x8]
GenerateDotGraphScript.java> Operand #0: ESI refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407ecf    MOV ECX,dword ptr [EBX + 0x9ba0]
GenerateDotGraphScript.java> Operand #0: ECX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407ef7    MOV ECX,dword ptr [EBX + 0x9ba0]
GenerateDotGraphScript.java> Operand #0: ECX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407f01    MOV EAX,dword ptr [EBX + 0x9ba4]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407f39    MOV ESI,dword ptr [EBX + 0x9ba8]
GenerateDotGraphScript.java> Operand #0: ESI refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407f3f    MOV ECX,dword ptr [EBX + 0x9ba4]
GenerateDotGraphScript.java> Operand #0: ECX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407f53    MOV EAX,dword ptr [EBX + 0x9ba0]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407f5b    MOV EDX,dword ptr [EBX + 0x9ba0]
GenerateDotGraphScript.java> Operand #0: EDX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407fc9    MOV ECX,dword ptr [EBX + 0x9ba0]
GenerateDotGraphScript.java> Operand #0: ECX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00407fd4    MOV EAX,dword ptr [EBX + 0x9ba4]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 0040800d    MOV EDI,dword ptr [EBX + 0x9ba8]
GenerateDotGraphScript.java> Operand #0: EDI refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00408013    MOV ECX,dword ptr [EBX + 0x9ba4]
GenerateDotGraphScript.java> Operand #0: ECX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00408027    MOV EAX,dword ptr [EBX + 0x9ba0]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 0040802f    MOV EDX,dword ptr [EBX + 0x9ba0]
GenerateDotGraphScript.java> Operand #0: EDX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 00408095    MOV ECX,dword ptr [EBX + 0x9ba8]
GenerateDotGraphScript.java> Operand #0: ECX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 0040809b    MOV EDX,dword ptr [EBX + 0x9ba4]
GenerateDotGraphScript.java> Operand #0: EDX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 004080af    MOV EAX,dword ptr [EBX + 0x9ba0]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> 004080c2    MOV EAX,dword ptr [EBX + 0x514]
GenerateDotGraphScript.java> Operand #0: EAX refType: READ_WRITE type: 0x00000200 (REG )
GenerateDotGraphScript.java> Finished!

xian-wen avatar Oct 21 '22 01:10 xian-wen

I am unable to reproduce your READ_WRITE result for the npp.6.2.3.Installer.exe locations specified above. Mine returns WRITE.

ghidra1 avatar Oct 21 '22 15:10 ghidra1

@ghidra1 That's weird. I just rebooted my PC, then created a brand-new project, then imported the npp executable file, then did the automatical analysis, finally ran the script, and got the same results.

xian-wen avatar Oct 21 '22 17:10 xian-wen

Could you please do a similar condition check as shown below for any 32-bit executable files?

if ("READ_WRITE".equals(operandType.toString()) && "MOV".equals(mnemonic) && !instruction.getDefaultOperandRepresentation(1).contains(operand)) {
    println(address + "    " + instruction);
    println("Operand #" + i + ": " + operand + " refType: " + operandType + " type: " + toHexString(type, true, true) + " (" + OperandType.toString(type) + ")");
}

xian-wen avatar Oct 21 '22 17:10 xian-wen

@xian-wen Very very weird. If I iterate over all instructions using currentProgram.getListing().getInstructions(true) and use your logic above I am able to reproduce your results. If instead, I spot check (which is what I had been doing until now) using currentProgram.getListing().getInstructionAt(currentAddress) I do not get your result and things look correct. This would imply that there is some sort of cross-contamination occuring within the language that is causing a cached state to impact another instruction. Keep in mind this is speculation and I have not isloated the specific cause. But you can now rest asured I can reproduce the issue. This issue appears to still be present in our latest code as well.

ghidra1 avatar Oct 24 '22 21:10 ghidra1

It is also worth pointing out that auto-analysis on the entire program does not leverage the methods getOperandRefType or the underlying method SleighInstructionPrototype.cacheDefaultOperandRefTypes.

ghidra1 avatar Oct 25 '22 14:10 ghidra1