codelldb
codelldb copied to clipboard
RFC/WIP: Support DAP disassemble request
Opening this up as a 'request for comments'. I had a quick go at implementing the dap disassemble request in CodeLLDB as I wanted something reliable to test Vimspector's disassemble view with.
Clearly this is a prototype. Would you be interested in a proper patch to support the DAP disassemble request?
CodeLLDB currently suppports a custom disassembly view and provides disassembly as "source" when debugging into objects with no sourceline info.
DAP now also has a disassemble
request which, given a memory refernce
from the stack trace, produces a set number of instructions from that
address.
This is simple to implement based on the existing DisassembledRange.
WIP.
- we don't return the exact number of instructions
- we don't populate a lot of the optional fields
- my-first-rust(TM)
- no tests yet
Thanks! Yes, I'd like to implement native DAP disassembly support at some point, and gave it a try a while back, in fact. However, I was not able to satisfactorily resolve the question of how to handle disassembling backwards, and then I got busy, so that stuff is on hold for now. If you'd like to think about it, here's my branch.
Thanks I’ll take a look
not able to satisfactorily resolve the question of how to handle disassembling backwards
I'm not sure I fully followed this. Are you referring to something like a negative instructionOffset
in the disassemble
request? I can see how that can be tricky, especially on intel/cisc systems with variable length instructions.
One idea springs to mind:
- map the current address to a source line
- disassemble from the previous source line's load_addr up to the current one, count the instructions
- repeat until we have enough, or the source's symbol (function) changes... or something
- if we get to the end and more were requested, pad with NOPs
This seems like it might be possible in theory. Need to look at the api for the practice though. I think it might be possible by using the SBCompileUnit directly. WDYT?
That aside, for now, I took your branch and added source line info to the disassembled instructions and tat seems to work with my (extremely limited) client implementation. I'll try and dig through the LLDB api to see if there's anything we can do about negative instruction offsets, but would just bailing out and not supporting that be an option?
Are you referring to something like a negative instructionOffset in the disassemble request?
Yes, that.
but would just bailing out and not supporting that be an option?
Don't think so. First and foremost, this is a VSCode extension, and VSCode's implementation of disassembly view uses negative offsets extensively.
disassemble from the previous source line's load_addr up to the current one, count the instructions
This will likely break in release builds: the optimizer may rearrange instructions such that they are not longer in line order. Also, disassembling must be able to function without any debug info whatsoever.
I can think of two methods:
- If the binary has debug info, find which function the current PC address is in, then disassemble starting from beginning of that function until PC is reached. Not sure what corner cases are there... one that comes to mind is that functions don't have to occupy a continuous range. For example profile-guided optimization may split a function into hot/cold parts and put these in different code sections.
- If current PC does not belong to any function, one can simply start disassembling at PC-x bytes, see if any invalid instructions were encountered, and whether PC ended up at the beginning of an instruction. Otherwise, try again at PC-x-1, and so on.
But it's probably possible to start in the middle of an instruction and get a bogus, but valid-looking instruction stream between PC-x and PC, though likelihood of that goes down the larger x is.
I expect that a robust implementation will require quite a bit of research and experimentation.
...I bet there is a blog post or a mailing list discussion somewhere on the internet which has all the tips and tricks, because the problem is definitely not new. However so far I've been unsuccessful in locating it :man_shrugging:
This branch has conflicts that must be resolved
I still have this on my TODO list by the way. I notice that vscode-cpptools seems to support a negative offset so it might be possible to reverse engineer what they do and pick it up again. Just need that "free" time people keep talking about :)
OK, so this is what MIEngine does:
private async Task<DisasmInstruction[]> VerifyDisassembly(DisasmInstruction[] instructions, ulong startAddress, ulong endAddress, ulong targetAddress)
{
if (startAddress > targetAddress || targetAddress > endAddress)
{
return instructions;
}
var originalInstructions = instructions;
int count = 0;
while (instructions != null && (instructions.Length == 0 || Array.Find(instructions, (i)=>i.Addr == targetAddress) == null) && count < _process.MaxInstructionSize)
{
count++;
startAddress--; // back up one byte
instructions = await Disassemble(_process, startAddress, endAddress); // try again
}
return instructions == null ? originalInstructions : instructions;
}
So basically:
- try to disassemble
MaxSizeOfOneInstruction * instructionCount+1
bytes startingaddress - MaxSizeOfOneInstruction * -instructionOffset
- if that results in a set of addresses that does not include an instruction starting at
address
(presumably, because it's invalid), then:- add
1
to the number of bytes in the range - decrement the start address by 1 byte
- and repeat.
- add
I don't love it, but I also don't hate it. What do you think?
FWIW this is what they do to calculate the "MaxSizeOfOneInstruction", which was my next question :)
public void SetTargetArch(TargetArchitecture arch)
{
switch (arch)
{
case TargetArchitecture.ARM:
MaxInstructionSize = 4;
Is64BitArch = false;
break;
case TargetArchitecture.ARM64:
MaxInstructionSize = 8;
Is64BitArch = true;
break;
case TargetArchitecture.X86:
MaxInstructionSize = 20;
Is64BitArch = false;
break;
case TargetArchitecture.X64:
MaxInstructionSize = 26;
Is64BitArch = true;
break;
case TargetArchitecture.Mips:
MaxInstructionSize = 4;
Is64BitArch = false;
break;
default:
throw new ArgumentOutOfRangeException(nameof(arch));
}
}
well, believe it or not, it works.
I'll tidy it up a bit and push a new PR.
What happens if startAddress lands in the middle of an instruction, such that the trailing bytes just happen to encode a valid instruction?
If the start address happens to be mid-instruction and that resolves to a valid instruction then one of a few things might happen:
- After reading the first "bogus" instruction, the stream is no longer interprettable and the likelihood of a valid instruction appearing in the stream at the exact requested base address is very low, so we would reject it and move back a byte.
- After reading the first "bogus" instruction, the "new" interpretation happens to end on the same byte location as the next valid instruction run the stream. we would then return one bogus instruction followed by N valid (correct) instructions. In all likelihood, this invalid instruction would then be chopped off the front. The reason for this is that we must return the exact number of requested instructions, and due to seeking backwards M * the MAX instruction size, we always overshoot and have to re-centre the result.
I need to craft some careful test cases around this. Sorry if the above explanation is not very clear. My WIP commit message is below and the change is here - it's still WIP and the code is terrible, but hopefully you get the idea:
Disassembly for negative instruction offsets
For a negative instruction offset, we have a challenge: what _byte_
position should we start disassembling at? For ARM this seems
fairly simple (all instructions are 4 bytes), but is complicated by
thumb which uses a mix of 2 and 4 byte instructions. X64 on the other
hand has technically unlimited instruction size (though in practice 15
bytes is the maximum).
We therefore can't just assume that we can offset the base address by
some fixed number of bytes and get the exact number of instructions we
want. Instead, we have to attempt to find a valid address, then
re-center the resulting instruction list around the requested base
address.
The way this works is as follows:
* If the instruction offset is positive or zero, LLDB gives us a
specific call to read a set number of instructions, so we use that,
padding with invalids if we underflow.
* Otherwise, for a negative instruction offset:
1. Guess a start address as base_address - instruction_offset * 16
2. Disassemble from there for instruction_count * 16 bytes
3. Check to see if the resulting set of instructions contains an
instruction whose address matches our base_address. If not, move 1
byte further back and try again. Do this up to 16 times and we
should find an address which is the start of an instruction
(assuming we're actually still in a code segment...)
4. Pad or truncate the start of the instruction list so that the
base_address instruction is at the expected location in the list.
* Slice and pad the disassembled instructions so that we have exactly
instruction_count entries, as required by the protocol.
Hello, I was taking a look at these changes to enable disassemble requests.
I was wondering, what's the difference between read_instructions()
(used for positive offsets) and read_memory()
+ get_instructions()
(used for negative offsets)?
Couldn't we use read_instructions()
in both cases?
Sorry for the dumb question. Thanks
LLDB API doesn’t provide a way to do a negative offset read. This is also more complex due to the variable length of instructions in x86 (hence the read memory gymnastics)
see explanation here https://github.com/vadimcn/vscode-lldb/pull/627#issuecomment-1271324798
I'm writing a custom extension and your implementation is being a good guidance!
Still, I'm having problems when VS Code asks for a large offset (e.g. -200
) for my small program and disassemble_byte_range()
attempts to read memory outside the current stack frame. When that happens, a few initial instructions (outside the current stack frame) are read but then it exits in advance (https://github.com/vadimcn/vscode-lldb/blob/master/adapter/src/disassembly.rs#L256) without retrieving the instructions from the current stack frame.
Is there anything I'm missing?
Could be a bug. Please can you raise an issue with steps to repro using codelldb and I can take a look.
Actually, in codelldb it works fine. That's why I was wondering where that case is handled in codelldb code.
I'm trying to do something similar but using the VS Code embedded Open Disassembly View. I implemented a similar logic to what you've done but I'm bumping into problems since, after apply the negative offset requested by VS Code, I end up outside the current stack frame (i.e. disassemble_byte_range()
returns some instructions outside the current stack frame).
I need to add a check on the start address but I didn't find any easy way to retrieve the stack frame start address from lldb.
The code for handing the disassemble request is here https://github.com/vadimcn/vscode-lldb/blob/master/adapter/src/debug_session.rs#L1134. I don’t know anything about vscode.
Yes, that's what I was already looking at and using as a reference. Maybe in your case you receive valid offsets, so my issues (i.e. offset resulting in reading memory out of current stack frame) doesn't show up. I'll dig into it.
I'm really struggling to understand what you're asking for. If you think the above code doesn't work in some scenario, I'm happy to look into that. I can assure you that I tested negative offsets that go outside the definition of the current "function". Even outside the binary image. I'm not sure what "stack frame" per se has to do with it. Disassembly is just taking a chunk of memory and trying to interpret that byte stream as instructions. Often the memory isn't actually instructions and you get various forms of invalid (or NOP) instruction instead. The idea of the above code is that it tries to determine a valid start address by heuristically disassembling various bytes (up to one instruction width back from the calculated start address) and looking to see if it "looks" valid. The stack is really not involved unless the code location happens to be very close to where the stack is in memory.
Your implementation works perfectly, I was having a problem on my side. Thanks for the reply!