llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

[lldb] Add ReadCStrings API to Process

Open felipepiovezan opened this issue 2 months ago • 15 comments

This commit uses Process::ReadMemoryRanges to create an efficient method for reading multiple strings at once. This method works like the single-string version, reading 256 bytes at a time, but instead doing it for every string requested at the same time.

felipepiovezan avatar Dec 12 '25 15:12 felipepiovezan

@llvm/pr-subscribers-lldb

Author: Felipe de Azevedo Piovezan (felipepiovezan)

Changes

This commit uses Process::ReadMemoryRanges to create an efficient method for reading multiple strings at once. This method works like the single-string version, reading 256 bytes at a time, but instead doing it for every string requested at the same time.


Full diff: https://github.com/llvm/llvm-project/pull/172026.diff

7 Files Affected:

  • (modified) lldb/include/lldb/API/SBProcess.h (+3)
  • (modified) lldb/include/lldb/Target/Process.h (+3)
  • (modified) lldb/source/API/SBProcess.cpp (+31)
  • (modified) lldb/source/Target/Process.cpp (+57)
  • (added) lldb/test/API/python_api/process/read_multiple_cstrings/Makefile (+3)
  • (added) lldb/test/API/python_api/process/read_multiple_cstrings/TestReadMultipleStrings.py (+46)
  • (added) lldb/test/API/python_api/process/read_multiple_cstrings/main.c (+8)
diff --git a/lldb/include/lldb/API/SBProcess.h b/lldb/include/lldb/API/SBProcess.h
index 882b8bd837131..5f04d3330a1d1 100644
--- a/lldb/include/lldb/API/SBProcess.h
+++ b/lldb/include/lldb/API/SBProcess.h
@@ -205,6 +205,9 @@ class LLDB_API SBProcess {
   size_t ReadCStringFromMemory(addr_t addr, void *char_buf, size_t size,
                                lldb::SBError &error);
 
+  SBStringList ReadCStringsFromMemory(SBValueList string_addresses,
+                                      SBError &error);
+
   uint64_t ReadUnsignedFromMemory(addr_t addr, uint32_t byte_size,
                                   lldb::SBError &error);
 
diff --git a/lldb/include/lldb/Target/Process.h b/lldb/include/lldb/Target/Process.h
index 8e6c16cbfe0fc..4493e81ce0eae 100644
--- a/lldb/include/lldb/Target/Process.h
+++ b/lldb/include/lldb/Target/Process.h
@@ -1680,6 +1680,9 @@ class Process : public std::enable_shared_from_this<Process>,
   size_t ReadCStringFromMemory(lldb::addr_t vm_addr, std::string &out_str,
                                Status &error);
 
+  llvm::SmallVector<std::optional<std::string>>
+  ReadCStringsFromMemory(llvm::ArrayRef<lldb::addr_t> addresses);
+
   /// Reads an unsigned integer of the specified byte size from process
   /// memory.
   ///
diff --git a/lldb/source/API/SBProcess.cpp b/lldb/source/API/SBProcess.cpp
index 14aa9432eed83..1a83a3d164e53 100644
--- a/lldb/source/API/SBProcess.cpp
+++ b/lldb/source/API/SBProcess.cpp
@@ -876,6 +876,37 @@ lldb::addr_t SBProcess::FindInMemory(const void *buf, uint64_t size,
                                   range.ref(), alignment, error.ref());
 }
 
+SBStringList SBProcess::ReadCStringsFromMemory(SBValueList sb_string_addresses,
+                                               SBError &error) {
+  std::vector<lldb::addr_t> string_addresses;
+  string_addresses.reserve(sb_string_addresses.GetSize());
+
+  for (size_t idx = 0; idx < sb_string_addresses.GetSize(); idx++) {
+    SBValue sb_address = sb_string_addresses.GetValueAtIndex(idx);
+    string_addresses.push_back(sb_address.GetValueAsAddress());
+  }
+
+  ProcessSP process_sp(GetSP());
+  if (!process_sp) {
+    error = Status::FromErrorString("SBProcess is invalid");
+    return {};
+  }
+  Process::StopLocker stop_locker;
+  if (!stop_locker.TryLock(&process_sp->GetRunLock())) {
+    error = Status::FromErrorString("process is running");
+    return {};
+  }
+
+  SBStringList strings;
+  llvm::SmallVector<std::optional<std::string>> maybe_strings =
+      process_sp->ReadCStringsFromMemory(string_addresses);
+
+  for (std::optional<std::string> maybe_str : maybe_strings)
+    strings.AppendString(maybe_str ? maybe_str->c_str() : "");
+
+  return strings;
+}
+
 size_t SBProcess::ReadMemory(addr_t addr, void *dst, size_t dst_len,
                              SBError &sb_error) {
   LLDB_INSTRUMENT_VA(this, addr, dst, dst_len, sb_error);
diff --git a/lldb/source/Target/Process.cpp b/lldb/source/Target/Process.cpp
index 9c8e8fa7041ee..3890a91dc4608 100644
--- a/lldb/source/Target/Process.cpp
+++ b/lldb/source/Target/Process.cpp
@@ -2135,6 +2135,63 @@ lldb::addr_t Process::FindInMemory(const uint8_t *buf, uint64_t size,
   return matches[0].GetBaseAddress().GetLoadAddress(&target);
 }
 
+llvm::SmallVector<std::optional<std::string>>
+Process::ReadCStringsFromMemory(llvm::ArrayRef<lldb::addr_t> addresses) {
+  // Make the same read width choice as ReadCStringFromMemory.
+  constexpr auto read_width = 256;
+
+  llvm::SmallVector<std::optional<std::string>> output_strs(addresses.size(),
+                                                            "");
+  llvm::SmallVector<Range<addr_t, size_t>> ranges{
+      llvm::map_range(addresses, [=](addr_t ptr) {
+        return Range<addr_t, size_t>(ptr, read_width);
+      })};
+
+  std::vector<uint8_t> buffer(read_width * addresses.size(), 0);
+  uint64_t num_completed_strings = 0;
+
+  while (num_completed_strings != addresses.size()) {
+    llvm::SmallVector<llvm::MutableArrayRef<uint8_t>> read_results =
+        ReadMemoryRanges(ranges, buffer);
+
+    // Each iteration of this loop either increments num_completed_strings or
+    // updates the base pointer of some range, guaranteeing forward progress of
+    // the outer loop.
+    for (auto [range, read_result, output_str] :
+         llvm::zip(ranges, read_results, output_strs)) {
+      // A previously completed string.
+      if (range.GetByteSize() == 0)
+        continue;
+
+      // The read failed, set the range to 0 to avoid reading it again.
+      if (read_result.empty()) {
+        output_str = std::nullopt;
+        range.SetByteSize(0);
+        num_completed_strings++;
+        continue;
+      }
+
+      // Convert ArrayRef to StringRef so the pointers work with std::string.
+      auto read_result_str = llvm::toStringRef(read_result);
+
+      const char *null_terminator_pos = llvm::find(read_result_str, '\0');
+      output_str->append(read_result_str.begin(), null_terminator_pos);
+
+      // If the terminator was found, this string is complete.
+      if (null_terminator_pos != read_result_str.end()) {
+        range.SetByteSize(0);
+        num_completed_strings++;
+      }
+      // Otherwise increment the base pointer for the next read.
+      else {
+        range.SetRangeBase(range.GetRangeBase() + read_result.size());
+      }
+    }
+  }
+
+  return output_strs;
+}
+
 size_t Process::ReadCStringFromMemory(addr_t addr, std::string &out_str,
                                       Status &error) {
   char buf[256];
diff --git a/lldb/test/API/python_api/process/read_multiple_cstrings/Makefile b/lldb/test/API/python_api/process/read_multiple_cstrings/Makefile
new file mode 100644
index 0000000000000..10495940055b6
--- /dev/null
+++ b/lldb/test/API/python_api/process/read_multiple_cstrings/Makefile
@@ -0,0 +1,3 @@
+C_SOURCES := main.c
+
+include Makefile.rules
diff --git a/lldb/test/API/python_api/process/read_multiple_cstrings/TestReadMultipleStrings.py b/lldb/test/API/python_api/process/read_multiple_cstrings/TestReadMultipleStrings.py
new file mode 100644
index 0000000000000..75ecfb13d8b77
--- /dev/null
+++ b/lldb/test/API/python_api/process/read_multiple_cstrings/TestReadMultipleStrings.py
@@ -0,0 +1,46 @@
+"""Test reading c-strings from memory via SB API."""
+
+import lldb
+from lldbsuite.test.decorators import *
+from lldbsuite.test.lldbtest import *
+from lldbsuite.test import lldbutil
+
+
+class TestReadMultipleStrings(TestBase):
+    NO_DEBUG_INFO_TESTCASE = True
+
+    def test_read_multiple_strings(self):
+        """Test corner case behavior of SBProcess::ReadCStringFromMemory"""
+        self.build()
+
+        (target, process, thread, bkpt) = lldbutil.run_to_source_breakpoint(
+            self, "breakpoint here", lldb.SBFileSpec("main.c")
+        )
+
+        frame = thread.GetFrameAtIndex(0)
+        err = lldb.SBError()
+
+        empty_str_addr = frame.FindVariable("empty_string")
+        self.assertSuccess(err)
+        str1_addr = frame.FindVariable("str1")
+        self.assertSuccess(err)
+        banana_addr = frame.FindVariable("banana")
+        self.assertSuccess(err)
+        bad_addr = frame.FindVariable("bad_addr")
+        self.assertSuccess(err)
+
+        string_addresses = [empty_str_addr, str1_addr, banana_addr, bad_addr]
+        for addr in string_addresses:
+            self.assertNotEqual(addr.GetValueAsUnsigned(), lldb.LLDB_INVALID_ADDRESS)
+
+        addresses = lldb.SBValueList()
+        for addr in string_addresses:
+            addresses.Append(addr)
+
+        strings = process.ReadCStringsFromMemory(addresses, err)
+        self.assertSuccess(err)
+        self.assertEqual(strings.GetStringAtIndex(0), "")
+        self.assertEqual(strings.GetStringAtIndex(1), "1")
+        self.assertEqual(strings.GetStringAtIndex(2), "banana")
+        # invalid address will also return an empty string.
+        self.assertEqual(strings.GetStringAtIndex(3), "")
diff --git a/lldb/test/API/python_api/process/read_multiple_cstrings/main.c b/lldb/test/API/python_api/process/read_multiple_cstrings/main.c
new file mode 100644
index 0000000000000..d7affad6734da
--- /dev/null
+++ b/lldb/test/API/python_api/process/read_multiple_cstrings/main.c
@@ -0,0 +1,8 @@
+int main(int argc, char **argv) {
+  const char *empty_string = "";
+  const char *str1 = "1";
+  const char *banana = "banana";
+  const char *bad_addr = (char *)0x100;
+
+  return 0; // breakpoint here
+}

llvmbot avatar Dec 12 '25 15:12 llvmbot

Could you fake a process from C++ and write it as a unit test, is that enough to get a Process object that's usable?

DavidSpickett avatar Dec 12 '25 16:12 DavidSpickett

Could you fake a process from C++ and write it as a unit test, is that enough to get a Process object that's usable?

We could employ a similar technique where we override the read memory method. However, this would not be using the new MultiMemRead packet for processes that support it, so that's my motivation for making an API test.

add this to the SB API just for testing purposes

I agree with all the problems you listed above, but, while for now this is just for testing purposes, it is a legitimately useful method for any clients of the SB API.

felipepiovezan avatar Dec 14 '25 11:12 felipepiovezan

Thanks for getting back to this use of the multi mem read feature, this is very cool.

My only worry with the patch is that we return an empty string for a memory read. This is a general issue with lldb, many of our memory read API will return 0 values for a memory read error, for instance, so when you're using lldb on a questionable environment (something is wrong), and you see a 0, you have to ask yourself "is it really 0 or is that a memory read error".

jasonmolenda avatar Dec 15 '25 03:12 jasonmolenda

Looking at SBStringList, there's no way to indicate an unreadable string in the returned list, except that the C++ SB API could return a nullptr, and surprise for any existing clients who don't handle that. We need to accept a 0-length string, I'm not suggesting otherwise, but I am displeased by the state we've gotten ourselves in.

jasonmolenda avatar Dec 15 '25 05:12 jasonmolenda

Looking at SBStringList, there's no way to indicate an unreadable string in the returned list, except that the C++ SB API could return a nullptr, and surprise for any existing clients who don't handle that. We need to accept a 0-length string, I'm not suggesting otherwise, but I am displeased by the state we've gotten ourselves in.

Yeah... if we had a SBString class, we would not have this issue, as clients are generally forced to ask if a SB object is valid before accessing them.

One thing we could do is set the SBError object to an error state if any string read failed. This way clients would know they can't trust an empty string result. What do you think?

felipepiovezan avatar Dec 15 '25 07:12 felipepiovezan

I agree with all the problems you listed above, but, while for now this is just for testing purposes, it is a legitimately useful method for any clients of the SB API.

I agree it has the look of something useful but if you didn't need it for testing I don't think that alone would justify including it.

One thing we could do is set the SBError object to an error state if any string read failed. This way clients would know they can't trust an empty string result. What do you think?

Depends on how they could then react to it. Looking at https://lldb.llvm.org/python_api/lldb.SBProcess.html#lldb.SBProcess.ReadCStringFromMemory, they could call this on any string that came back empty. Though they might not know that partial reads aren't really a thing, so they may call it on all of them, which is fine but less efficient.

  • ReadCStrings from A B and C
  • If error is failure
  • ReadCString A
  • Check error
  • Repeat for B and C

Maybe we could use SBStructuredData? Though I don't see a way for a client to discover the length of a contained string value. GetStringValue takes a length but you'd have to keep guessing larger and larger numbers? I have not used this type before though so I could be missing something.

Also I don't see a way to put errors in it. We could say oh if you find this particular type then it's an error string but that's getting very fiddly. You could look at other structured data APIs for inspiration.

We need to accept a 0-length string, I'm not suggesting otherwise, but I am displeased by the state we've gotten ourselves in.

There's an argument that says that things that look like ReadCString (singular) should be exactly as rubbish as ReadCString so they they are familiar in both name and behaviour. Otherwise our attempts to make slight improvements just make it more confusing as each variant needs different handling.

The drawback is you assign the most perfect, obvious name, to the one with the awkward handling. Then we inevitably add ReadCStringEx which is actually just ReadCStringsEx(<list of one>), but at some point this is just how APIs go isn't it.

My point is that if you make too much effort it actually gets harder to use. So even though "failure error means don't trust zero length string" is a bit of a hack, maybe it's just enough of a hack :) .

DavidSpickett avatar Dec 15 '25 09:12 DavidSpickett

Maybe we could use SBStructuredData? Though I don't see a way for a client to discover the length of a contained string value. GetStringValue takes a length but you'd have to keep guessing larger and larger numbers? I have not used this type before though so I could be missing something.

That would allow for an array of ["str1", "str2", null, "", "str3"], with null indicating a read failure.

SBStructuredData::GetStringValue returns a size_t which is the complete size of the string, so it can be called with nullptr or length 0 to find the complete string size, then re-call to fetch. Or we could add a GetValueString() API that takes an SBStream like other SB API that return strings.

jasonmolenda avatar Dec 16 '25 08:12 jasonmolenda

I'd return an SBValueList as well: in the bad_addr case, we don't return an error which is wrong imo.

An SBValueList is another valid way to express a read failure. Note that users are expected to check SBValue::GetError() to see if a value was able to be fetched. SBValue::IsValid() only tells you if there is a ValueObject backing this SB object, and there is, to hold the error value/message.

jasonmolenda avatar Dec 16 '25 08:12 jasonmolenda

This patch has wandered into two unfortunate areas of SB API -- the trickiness of indicating failures of memory reads, and the lack of elegance for returning variable length string values. I don't know if I think this PR needs to handle them, it's one more log on the pile of places you can miss a memory read, Felipe up to you, IMO.

As for string inelegance, I almost wish we could have a contract where something like an SBValue could return a char* into a little const string pool associated with the object, and the lifetime of that char* was explicitly tied to the life of that SBValue object. But we hand out char*'s in many API already today for things that are forever-valid, like strings in a Module's symbol table or whatever.

jasonmolenda avatar Dec 16 '25 09:12 jasonmolenda

I've been struggling to figure out how to get a string value into an SBValueList. If anyone has an example, I'd appreciate it. My grepping / LSP skills are failing me.

felipepiovezan avatar Dec 16 '25 10:12 felipepiovezan

This patch has wandered into two unfortunate areas of SB API -- the trickiness of indicating failures of memory reads, and the lack of elegance for returning variable length string values. I don't know if I think this PR needs to handle them, it's one more log on the pile of places you can miss a memory read, Felipe up to you, IMO.

As for string inelegance, I almost wish we could have a contract where something like an SBValue could return a char* into a little const string pool associated with the object, and the lifetime of that char* was explicitly tied to the life of that SBValue object. But we hand out char*'s in many API already today for things that are forever-valid, like strings in a Module's symbol table or whatever.

I think there is something to be said about how easy it is to use the StringList API, which I believe is also a similar point as David was making. @medismailben were you not convinced by those points?

felipepiovezan avatar Dec 16 '25 10:12 felipepiovezan

@jasonmolenda @medismailben @DavidSpickett I really don't think using SBValues is the right approach here, this is fundamentally an abstraction about language objects and types; it does not model the idea of "read a string from memory". Yes, it would solve the error problem, but at the wrong abstraction level, which I think is a bigger sin.

If I understand correctly, both @jasonmolenda and @DavidSpickett are fine with the SBStringList approach, but @medismailben is not. @medismailben, do you have any alternatives that would not involve using the compiler type system here? Or can you be persuaded that SBValue is the wrong abstraction?

felipepiovezan avatar Dec 17 '25 10:12 felipepiovezan

If I understand correctly, both @jasonmolenda and @DavidSpickett are fine with the SBStringList approach

I am suggesting that you could make such an argument but because I don't know enough about the existing SB types, I am not making it for you.

Ideally we'd settle this by comparing:

  • What you have suggested already
  • The best representation using existing SB types
  • The best representation if we added some amount of new SB types

I trust @medismailben more than myself to have ideas for the latter 2.

Then looking at it for:

  • Usability / complexity
  • Fitting in, or not, with the other APIs

If this weren't something in a public API I would not care at all, or if it was under some strange name rather than the perfect name for this functionality. To be honest that's the main problem I have, that the most obvious name would also be the one with the footguns. Then again, we have plenty of obviously named APIs with footguns so it's not like we haven't compromised before, but I prefer to compromise intentionally.

Side idea: you could write a "unit test" that starts a process using the internal C++ API and maybe that will use the packets? That at least would delay the API question.

DavidSpickett avatar Dec 17 '25 11:12 DavidSpickett

Side idea: you could write a "unit test" that starts a process using the internal C++ API and maybe that will use the packets? That at least would delay the API question.

Yeah, let me have a look at that. Either that, or just punt on having a test that actually uses the new packet for this specific method since, in theory, it is already battle-tested elsewhere.

felipepiovezan avatar Dec 17 '25 11:12 felipepiovezan

I'm not a big fan of using the StringList as the return type since it's make ReadCStrings a lossy API. Having such critical API be lossy seems wrong to me. If you don't think SBValueList is the right thing to use, we should come up with something else.

I think it would be better to change Process::ReadMemoryRanges to return llvm::SmallVector<llvm::Expected<llvm::MutableArrayRef<uint8_t>>> and then we could change lldb_private::StringList to hold a std::vector<llvm::Expected<std::string>>, add a HasError method to check if it contains any error, as well as a GetErrorAtIndex() method to surface the error appropriately. That way you could keep your SBStringString api, but in the even where there was a memory read failure for one of the addresses, the user could get the error from the same SBStringList object.

I'd recommend doing this in a separate PR, that'd land before this one, so we can make use of it here.

medismailben avatar Dec 17 '25 19:12 medismailben

I think it would be better to change Process::ReadMemoryRanges to return llvm::SmallVector<llvm::Expected<llvm::MutableArrayRef<uint8_t>>> and then we could change lldb_private::StringList to hold a std::vector<llvm::Expected<std::string>>, add a HasError method to check if it contains any error, as well as a GetErrorAtIndex() method to surface the error appropriately.

This was discussed during the RFC, and the conclusion was that if a memory read did not return the number of bytes requested, a user could call the single-address variant if they really wanted more details. Even then, it was felt that what the error was was not relevant, only the fact that some error occurred, which is already expressed by the read length.

That way you could keep your SBStringString api, but in the even where there was a memory read failure for one of the addresses, the user could get the error from the same SBStringList object.

This would be akin to changing the behavior of an SB object, which I don't think we are allowed to do

felipepiovezan avatar Dec 18 '25 09:12 felipepiovezan

I will just remove the SBAPI portion of this PR. This will give us time to think about the public API.

felipepiovezan avatar Dec 18 '25 09:12 felipepiovezan

I think it would be better to change Process::ReadMemoryRanges to return llvm::SmallVector<llvm::Expected<llvm::MutableArrayRef<uint8_t>>> and then we could change lldb_private::StringList to hold a std::vector<llvm::Expected<std::string>>, add a HasError method to check if it contains any error, as well as a GetErrorAtIndex() method to surface the error appropriately.

This was discussed during the RFC, and the conclusion was that if a memory read did not return the number of bytes requested, a user could call the single-address variant if they really wanted more details. Even then, it was felt that what the error was was not relevant, only the fact that some error occurred, which is already expressed by the read length.

I saw that in the code ... we can fix debug server & lldb_private to surface a relevant error message to the user but we can't modify the API after it lands. That's why I'm so picky about this. I'd rather have that room to improve in the future than having to make a new API and make it confusing which one to use.

That way you could keep your SBStringString api, but in the even where there was a memory read failure for one of the addresses, the user could get the error from the same SBStringList object.

This would be akin to changing the behavior of an SB object, which I don't think we are allowed to do

I don't think changing an SB object behavior is forbidden as long as the API don't change. Also, I don't think my suggestion would change the behavior of SBStringList since it'd be purely additive: Let's say you have a memory read error at index 2 with a 4 elements SBStringList object, SBStringList.GetStringAtIndex(2) would still give you an empty string, which is what you're doing currently, but you could get the error from a different SBStringList.GetErrorAtIndex(2) API.

Similarly, if there is no error at that index, that API will return a default construct lldb.SBError:

>>> err = lldb.SBError()
>>> err.Success()
True

I will just remove the SBAPI portion of this PR. This will give us time to think about the public API.

Sounds good, may be @jingham or @JDevlieghere will have some ideas how to deal with this, after the holidays.

medismailben avatar Dec 18 '25 12:12 medismailben

:white_check_mark: With the latest revision this PR passed the C/C++ code formatter.

github-actions[bot] avatar Dec 18 '25 12:12 github-actions[bot]

Ok, this PR no longer changes the SBAPI

felipepiovezan avatar Dec 18 '25 12:12 felipepiovezan

@jasonmolenda is on vacation now, but I think he is ok with the PR. @DavidSpickett any last thoughts?

felipepiovezan avatar Dec 18 '25 13:12 felipepiovezan

:penguin: Linux x64 Test Results

  • 33235 tests passed
  • 507 tests skipped

:white_check_mark: The build succeeded and all tests passed.

github-actions[bot] avatar Dec 18 '25 16:12 github-actions[bot]

@DavidSpickett given all your previous comments, I am going to assume you're reasonably happy with this PR and merge it. If you have any other ideas, I will be happy to address them in follow up patches.

(Jason is also ok with the PR based on previous conversations)

felipepiovezan avatar Dec 22 '25 12:12 felipepiovezan