drgn libdrgn: improve C string reading efficiency

The current C string reading implementation is inefficient, especially for low bandwidth remote targets, as it needs to do a separate segment read (including a fresh page table lookup) for each character read. A more efficient approach is to retain the page table between character reads, only discarding it when we hit the null terminator.

Implement this approach by allowing segments to also specify a C string reading callback. The callback for page tables will preserve the page table iterator while reading characters.

Jun 28 '23 01:06 pcc

This will need some fixups if #310 lands first, as we'll need to canonicalize the address passed to drgn_memory_reader_read_cstr as well.

Jul 01 '23 02:07 pcc

I don't love the idea of adding another callback for this. I've thought about this inefficiency in the past, and the approach that I was considering was to have drgn_program_read_c_string() read in chunks larger than 1 byte (e.g., up to the next 4k boundary, or 64 bytes, or the minimum of those two; we could experiment with different heuristics). The complication is that the memory reader interface is currently all-or-nothing; either your whole memory read succeeds, or it fails and the whole buffer is potentially garbage. So I'd like to make the memory reader interface allow partial reads.

This might be less optimal in a few edge cases, but I think it's good enough, and it's a bit cleaner. There are other use cases that I'd like to allow partial reads for anyways (e.g., for batched reads of all of the struct pages in the system while handling #27).

First of all, do you think this approach would solve your problem? Secondly, would you be interested in implementing it? If so, I have some more specific thoughts about how to do it that I can share. No worries if not, I can get to it soon-ish.

Oct 07 '23 00:10 osandov

I also considered whether C string reads could be optimized by using/extending the existing callback. But it seemed to me that null-terminated strings were the only special case that could justify having a separate callback, given how common they are in the types of programs that drgn is used to debug, together with the low "granularity" (per byte) of the termination check, which increases the benefit of pushing down optimizations to a lower level.

My concern with larger transfer sizes is similar to what I've seen with e.g. #312: these debug adapters can be highly bandwidth and latency constrained, creating the need to carefully optimize what we send over the wire to the adapter and thence to the target. At least in my experience the strings being transferred are typically relatively short which means that it is better to try to reduce bandwidth in this case.

That's not to say that latency isn't important; quite the contrary, it can have a big impact on long strings. But with a C string specialized approach I think there is an opportunity to further optimize latency for long strings in the debug adapter case. The idea that I had was to put more intelligence into the debug adapter so that it can issue the necessary JTAG/SWD commands to read up to the null terminator without host involvement, so you don't need a USB round trip except when crossing a page boundary. (This could be done by extending an open source debug adapter firmware such as DAPLink, together with the associated protocol standard.) This would be precluded with a short read based approach.

That all being said, I'm not fundamentally opposed to a short read based approach as long as it doesn't regress performance perceptibly in typical remote debugging scenarios. (If you knew anything about me, you would know that I am the last person to try to push for unnecessary optimizations, so I am happy to be proven wrong about my assumptions, but they are based on several months of experience working with these debug adapters.) If you would like to try implementing it, I can try running benchmarks on my setup.

Oct 07 '23 02:10 pcc

Ok, you've convinced me that the extra callback makes sense. In fact, it would also benefit the /proc/kcore and local core dump readers, since those could use my suggestion of reading extra characters but with the additional information they have about what addresses are valid to read from.

I'll come back to this one after #310 and #312, which have some bearing on this.

Oct 09 '23 22:10 osandov

drgn drgn copied to clipboard

libdrgn: improve C string reading efficiency

drgn
drgn copied to clipboard