llvm-project
llvm-project copied to clipboard
Colorize output when searching for symbols in `lldb`
ripgrep
and grep
have great colorization options for easily parsing search output.
Here's rg
, for example:

It would be really great if lldb
supported this for image lookup -r -n <REGEXP>
and other like commands.
@llvm/issue-subscribers-lldb
lldb does support colored output - there's actually dedicated syntax for it in the frame & thread format, and people have been adding color code in other cases where it makes sense. Should be pretty straightforward to add to the image lookup output as well. If someone decides to take this on, remember that there's a use-color setting that you have to obey as well.
Jim
On Aug 25, 2022, at 6:12 PM, Aaron Lichtman @.***> wrote:
ripgrep and grep have great colorization options for easily parsing search output.
Here's rg, for example:
https://user-images.githubusercontent.com/20600565/186795335-eac0c080-45c1-42b2-b207-b3d3797a080a.png It would be really great if lldb supported this for image lookup -r -n <REGEXP> and other like commands.
— Reply to this email directly, view it on GitHub https://github.com/llvm/llvm-project/issues/57372, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUPVW3LTJX7FN7PCONRGG3V3AKYHANCNFSM57VAR77Q. You are receiving this because you are subscribed to this thread.
I would like to look work on this issue. I'm new to LLVM (gone through kaleidoscope tutorials and building JIT). Any advice on how to begin?
thanks!
@llvm/issue-subscribers-good-first-issue
Well first off make sure you can build and run the test suite succesfully: https://lldb.llvm.org/resources/build.html
The conversion from the format Jim mentioned is in lldb/include/lldb/Utility/AnsiTerminal.h
, but you'll want to find who actually calls that.
A command that uses this already is frame
. Make a simple hello world and break in main, then:
(lldb) frame info
frame #0: 0x0000aaaaaaaaa740 test.o`main at test.cpp:9:3
You'll see that line is coloured.
That command is defined in https://github.com/llvm/llvm-project/blob/feea7ef23cb1bef92d363cc613052f8f3a878fc2/lldb/source/Commands/CommandObjectFrame.cpp#L193. See DoExecute
.
I suggest from there that you follow the callstack down and you'll see where colour tags are inserted. I really recommend using an IDE for this as you can just "go to definition" on each caller.
Or you can debug the debugger if you want, but again, a good IDE should be enough. I replied to someone else about debugging lldb itself here https://discourse.llvm.org/t/how-to-debug-lldb-source-codes/65598/2?u=davidspickett.
Once you get how frame info
does it, stick some calls in the symbol comands and see what you can come up with. This would be a nice improvement.
thank you. I'm working on it.
The color tags for frame info
are set from CoreProperties.td
. It has python code, that defines a format string to add color to frame info
.
I could not figure out how this python code is read by lldb
. There seems be no calls to read CoreProperties.td
from cpp lldb
code.
So this is an instance of https://discourse.llvm.org/t/cant-find-all-values-of-enum-attrkind-in-attributes-h-file/65869/2?u=davidspickett
See if that helps any, this is a very common thing to trip over. Still gets me sometimes.
Once you find where the .inc
is included, there's probably a few more layers to get through before cpp directly reads it. Some macro generation and whatnot.
The other way to do it is to say ok we know this string comes to frame
somehow, can we find where the command reads the string from, by reading just the command's source. Rather than following the trail from the user provided string all the way to frame
.
Since what you want to do really is hardcode/generate a similar string but in image lookup
. So the mechanism where one can override it isn't so important for the moment.
I should note, in your case you'll be searching for the .td
file name to find what it is used to generate. Where the poster there had the .inc
but wanted the .td
, the other way around.
The image lookup command is defined in: https://github.com/llvm/llvm-project/blob/b13f7f9c06604110709a968a2fece4b8d5192708/lldb/source/Commands/CommandObjectTarget.cpp#L3875
regular expression matching is done in function LookupSymbolInModule
https://github.com/llvm/llvm-project/blob/b13f7f9c06604110709a968a2fece4b8d5192708/lldb/source/Commands/CommandObjectTarget.cpp#L1527
which calls RegularExpression
match via Symtab::AppendSymbolIndexesMatchingRegExAndType
https://github.com/llvm/llvm-project/blob/f793597f6d5d8eb86388263ce16365ceb10fee23/lldb/source/Utility/RegularExpression.cpp#L28
which further calls, llvm's Regex match
function
https://github.com/llvm/llvm-project/blob/f793597f6d5d8eb86388263ce16365ceb10fee23/llvm/lib/Support/Regex.cpp#L86
The regular expression operation called in function LookupSymbolInModule
returns the indexes of input strings that match
.
If it could also return the exact substring
that matched in the input string
. Then I could split the string accordingly and colorize the matched part
.
Currently we passing matches
as null in regex execute.
https://github.com/llvm/llvm-project/blob/48aea4a36ad39e2ad03f069def19cceb50217660/lldb/source/Symbol/Symtab.cpp#L785
I tried passing matches, as my understanding was that matches
would the return the vector of strings that matched in the input string.
Ex: input string = "foo is not a bar" and regex = "foo | bar" matches would return the vector {"foo", "bar"}.
But it is not doing so. It only returns foo
. Maybe I'm using the API incorrecty or is there any other api which provides this?
Is this a "match" vs. "search" issue? If you have a look at python's re.match and re.search you'll see what I mean. Match goes from the start of the string only, maybe we copied the names of those functions.
If not I'll give it a look myself tomorrow.
The implementation is similar to pythons re.search, where it searches the entire string for the first match
and not all the matches
.
Ex: input string = "This string foo is not a bar" and regex = "foo | bar"
returns only "foo"
unlike ripgrep
I realised what's going on. When you ask for matches it's returning first the whole match and then any other matching groups within that match.
llvm::Regex re("(abc)(def)(ghi)");
llvm::SmallVector<llvm::StringRef> matches;
bool matched = re.match("abcdefghi", &matches, nullptr);
for (auto match: matches)
printf("Match: %s\n", match.str().c_str());
(and note that StringRef may not be null terminated so when you print it you need to convert to something that is)
This code produces:
Match: abcdefghi
Match: abc
Match: def
Match: ghi
In an online tester that I like to use: https://regex101.com/r/P8tZS7/1
"matches" is returning you "the match" as in the whole match plus any "matching groups". Which is also what python does but over there you would do match.groups()
to get the same thing.
thanks! Can I first submit the patch for review and then add the test case later? I'm expecting there will be lot of changes during review.
Excited to see this progressing! Putting up the code first is fine, we can talk about testing on the review.
Please add me as a reviewer, I am https://reviews.llvm.org/p/DavidSpickett/ on Phabricator.
Thanks. I have added you as a reviewer.
For reference the patch was https://reviews.llvm.org/D136462 but remains in review.
I would like to look work on this issue. I'm new to LLVM (gone through kaleidoscope tutorials and building JIT). Any advice on how to begin?
thanks!
Hi Varun! Are you still working on it? I'm planning to work on same project and maybe we can collaborate?