llvm-project icon indicating copy to clipboard operation
llvm-project copied to clipboard

Colorize output when searching for symbols in `lldb`

Open alichtman opened this issue 2 years ago • 13 comments

ripgrep and grep have great colorization options for easily parsing search output.

Here's rg, for example:

image

It would be really great if lldb supported this for image lookup -r -n <REGEXP> and other like commands.

alichtman avatar Aug 26 '22 01:08 alichtman

@llvm/issue-subscribers-lldb

llvmbot avatar Aug 26 '22 01:08 llvmbot

lldb does support colored output - there's actually dedicated syntax for it in the frame & thread format, and people have been adding color code in other cases where it makes sense. Should be pretty straightforward to add to the image lookup output as well. If someone decides to take this on, remember that there's a use-color setting that you have to obey as well.

Jim

On Aug 25, 2022, at 6:12 PM, Aaron Lichtman @.***> wrote:

ripgrep and grep have great colorization options for easily parsing search output.

Here's rg, for example:

https://user-images.githubusercontent.com/20600565/186795335-eac0c080-45c1-42b2-b207-b3d3797a080a.png It would be really great if lldb supported this for image lookup -r -n <REGEXP> and other like commands.

— Reply to this email directly, view it on GitHub https://github.com/llvm/llvm-project/issues/57372, or unsubscribe https://github.com/notifications/unsubscribe-auth/ADUPVW3LTJX7FN7PCONRGG3V3AKYHANCNFSM57VAR77Q. You are receiving this because you are subscribed to this thread.

jimingham avatar Aug 26 '22 01:08 jimingham

I would like to look work on this issue. I'm new to LLVM (gone through kaleidoscope tutorials and building JIT). Any advice on how to begin?

thanks!

varunkumare99 avatar Oct 04 '22 14:10 varunkumare99

@llvm/issue-subscribers-good-first-issue

llvmbot avatar Oct 04 '22 14:10 llvmbot

Well first off make sure you can build and run the test suite succesfully: https://lldb.llvm.org/resources/build.html

The conversion from the format Jim mentioned is in lldb/include/lldb/Utility/AnsiTerminal.h, but you'll want to find who actually calls that.

A command that uses this already is frame. Make a simple hello world and break in main, then:

(lldb) frame info
frame #0: 0x0000aaaaaaaaa740 test.o`main at test.cpp:9:3

You'll see that line is coloured.

That command is defined in https://github.com/llvm/llvm-project/blob/feea7ef23cb1bef92d363cc613052f8f3a878fc2/lldb/source/Commands/CommandObjectFrame.cpp#L193. See DoExecute.

I suggest from there that you follow the callstack down and you'll see where colour tags are inserted. I really recommend using an IDE for this as you can just "go to definition" on each caller.

Or you can debug the debugger if you want, but again, a good IDE should be enough. I replied to someone else about debugging lldb itself here https://discourse.llvm.org/t/how-to-debug-lldb-source-codes/65598/2?u=davidspickett.

Once you get how frame info does it, stick some calls in the symbol comands and see what you can come up with. This would be a nice improvement.

DavidSpickett avatar Oct 05 '22 09:10 DavidSpickett

thank you. I'm working on it.

varunkumare99 avatar Oct 05 '22 12:10 varunkumare99

The color tags for frame info are set from CoreProperties.td . It has python code, that defines a format string to add color to frame info. I could not figure out how this python code is read by lldb. There seems be no calls to read CoreProperties.td from cpp lldb code.

varunkumare99 avatar Oct 12 '22 08:10 varunkumare99

So this is an instance of https://discourse.llvm.org/t/cant-find-all-values-of-enum-attrkind-in-attributes-h-file/65869/2?u=davidspickett

See if that helps any, this is a very common thing to trip over. Still gets me sometimes.

Once you find where the .inc is included, there's probably a few more layers to get through before cpp directly reads it. Some macro generation and whatnot.

The other way to do it is to say ok we know this string comes to frame somehow, can we find where the command reads the string from, by reading just the command's source. Rather than following the trail from the user provided string all the way to frame.

Since what you want to do really is hardcode/generate a similar string but in image lookup. So the mechanism where one can override it isn't so important for the moment.

DavidSpickett avatar Oct 12 '22 09:10 DavidSpickett

I should note, in your case you'll be searching for the .td file name to find what it is used to generate. Where the poster there had the .inc but wanted the .td, the other way around.

DavidSpickett avatar Oct 12 '22 09:10 DavidSpickett

The image lookup command is defined in: https://github.com/llvm/llvm-project/blob/b13f7f9c06604110709a968a2fece4b8d5192708/lldb/source/Commands/CommandObjectTarget.cpp#L3875

regular expression matching is done in function LookupSymbolInModule https://github.com/llvm/llvm-project/blob/b13f7f9c06604110709a968a2fece4b8d5192708/lldb/source/Commands/CommandObjectTarget.cpp#L1527

which calls RegularExpression match via Symtab::AppendSymbolIndexesMatchingRegExAndType https://github.com/llvm/llvm-project/blob/f793597f6d5d8eb86388263ce16365ceb10fee23/lldb/source/Utility/RegularExpression.cpp#L28

which further calls, llvm's Regex match function https://github.com/llvm/llvm-project/blob/f793597f6d5d8eb86388263ce16365ceb10fee23/llvm/lib/Support/Regex.cpp#L86

The regular expression operation called in function LookupSymbolInModule returns the indexes of input strings that match. If it could also return the exact substring that matched in the input string. Then I could split the string accordingly and colorize the matched part.

Currently we passing matches as null in regex execute. https://github.com/llvm/llvm-project/blob/48aea4a36ad39e2ad03f069def19cceb50217660/lldb/source/Symbol/Symtab.cpp#L785 I tried passing matches, as my understanding was that matches would the return the vector of strings that matched in the input string.

Ex: input string = "foo is not a bar" and regex = "foo | bar" matches would return the vector {"foo", "bar"}.

But it is not doing so. It only returns foo. Maybe I'm using the API incorrecty or is there any other api which provides this?

varunkumare99 avatar Oct 13 '22 11:10 varunkumare99

Is this a "match" vs. "search" issue? If you have a look at python's re.match and re.search you'll see what I mean. Match goes from the start of the string only, maybe we copied the names of those functions.

If not I'll give it a look myself tomorrow.

DavidSpickett avatar Oct 13 '22 14:10 DavidSpickett

The implementation is similar to pythons re.search, where it searches the entire string for the first match and not all the matches.

Ex: input string = "This string foo is not a bar" and regex = "foo | bar"
returns only "foo"

unlike ripgrep Screenshot from 2022-10-13 22-20-21

varunkumare99 avatar Oct 13 '22 17:10 varunkumare99

I realised what's going on. When you ask for matches it's returning first the whole match and then any other matching groups within that match.

  llvm::Regex re("(abc)(def)(ghi)");
  llvm::SmallVector<llvm::StringRef> matches;
  bool matched = re.match("abcdefghi", &matches, nullptr);

  for (auto match: matches)
    printf("Match: %s\n", match.str().c_str());

(and note that StringRef may not be null terminated so when you print it you need to convert to something that is)

This code produces:

Match: abcdefghi
Match: abc
Match: def
Match: ghi

In an online tester that I like to use: https://regex101.com/r/P8tZS7/1

"matches" is returning you "the match" as in the whole match plus any "matching groups". Which is also what python does but over there you would do match.groups() to get the same thing.

DavidSpickett avatar Oct 14 '22 08:10 DavidSpickett

thanks! Can I first submit the patch for review and then add the test case later? I'm expecting there will be lot of changes during review.

varunkumare99 avatar Oct 21 '22 12:10 varunkumare99

Excited to see this progressing! Putting up the code first is fine, we can talk about testing on the review.

Please add me as a reviewer, I am https://reviews.llvm.org/p/DavidSpickett/ on Phabricator.

DavidSpickett avatar Oct 21 '22 13:10 DavidSpickett

Thanks. I have added you as a reviewer.

varunkumare99 avatar Oct 21 '22 16:10 varunkumare99

For reference the patch was https://reviews.llvm.org/D136462 but remains in review.

DavidSpickett avatar Sep 15 '23 10:09 DavidSpickett

I would like to look work on this issue. I'm new to LLVM (gone through kaleidoscope tutorials and building JIT). Any advice on how to begin?

thanks!

Hi Varun! Are you still working on it? I'm planning to work on same project and maybe we can collaborate?

taalhaataahir0102 avatar Sep 18 '23 06:09 taalhaataahir0102