lnav icon indicating copy to clipboard operation
lnav copied to clipboard

Histogram isn't showing all current results

Open Tejeev opened this issue 3 years ago • 7 comments

lnav version lnav 0.10.0-98-gb3b8ed7 2146 latest/edge timothyshanestack -

Describe the bug Histogram isn't showing all the log lines in the current view image image

To Reproduce Steps to reproduce the behavior: This seems to happen more often when I have multiple files loaded. I apply various filters and then go to view the histogram and it's got almost nothing

Tejeev avatar Sep 30 '21 14:09 Tejeev

Looks like it's only reporting the warnings? Maybe something changed in the hotkeys or I just accidentally changed some config? 2021-09-23 05:27:57.180058 W | etcdserver/api/etcdhttp: /health error; no leader (status code 503)

Tejeev avatar Sep 30 '21 14:09 Tejeev

Actually, I think something is wrong with the filters. These return numbers are all hokey: image I took out the regex and the returns for the filter went up to 40k and the search results had ~800. Not sure how I can have less returns of part of the filter string.

Tejeev avatar Sep 30 '21 14:09 Tejeev

Are these logs live (new lines being added while being viewed)?

Actually, I think something is wrong with the filters. These return numbers are all hokey:

An OUT filter will remove lines that are matched by an IN filter. Can you do the search with the filters disabled and see how many search hits there are?

tstack avatar Sep 30 '21 15:09 tstack

Actually, I think something is wrong with the filters. These return numbers are all hokey:

Ah, the count in the filter UI is the number of lines and not the number of times the text was matched. So, if the text was found by a search, that would result in a single hit, but the filter UI would report the number of lines in the message where the text was found. Hmm, I need to either change the reporting to match the number of messages or change "hits" to "lines" in the UI.

tstack avatar Sep 30 '21 16:09 tstack

These logs are not live. It's not the out but the IN that bothers me. Turning off the OUT doesn't change the number of lines in the view as none of the IN's in that example also matched the OUTs (they are mutually exclusive and the OUT was only there as a leftover from a previous stage of my log diving) I captured another image showing many lines that did not match the IN in the logs, but I removed the image because it had some customer-specific information that while not risky (unique container IDs), I'm not sure the customer would appreciate being public even if it's not known who's logs they are. Just to be on the safe side, I've instead pasted below screen caps of the filters in place with the numbers of hits on a search for "took too long" followed by some of the log lines that showed up with those filters (thanks btw, mark and copy are working great for me now when diving customer logs).

Regarding the hit count vs the search hit count, what concerns me is that the latter has fewer hits than the first where I would expect them to be equal or the latter be equal or greater if there are more hits per line (with the understanding that the filter UI reports lines with hits and the search may report either number of matches total or lines with hits). I think I may have figured it out though. I believe this log may not be parsing the JSON format messages (I think that's what they are) as separate lines but instead, as continuations of lines like 2021-09-23 05:28:03.322700 I | etcdserver/api/etcdhttp: /health OK (status code 200) even though they do appear on lines of their own.
It's kind of complicated by the fact that:

  • The histogram appears to confirm this with 10 lines (specifical ones in the format mentioned above) when I filter for \d\d\.\d+s but the log view shows man many other lines
  • But when I filter by took too long the histogram seems to count all those other lines, or at least counts a lot of lines in the JSON format, even though the took too long and \d\d\.\d+s strings mostly occur together.

image

2021-09-23 05:27:57.180058 W | etcdserver/api/etcdhttp: /health error; no leader (status code 503)
{"level":"warn","ts":"2021-09-23T05:28:06.446Z","caller":"rafthttp/peer.go:267","msg":"dropped internal Raft message since sending buffer is full (overloaded network)","message-type":"MsgHeartbeat","local-member-id":"REDACTED","from":"REDACTED","remote-peer-id":"REDACTED","remote-peer-active":false}

image

2021-09-23 05:28:03.322700 I | etcdserver/api/etcdhttp: /health OK (status code 200)
{"level":"warn","ts":"2021-09-23T05:28:04.275Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"2.00004676s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/health\" ","response":"","error":"context deadline exceeded"}

Tejeev avatar Oct 01 '21 14:10 Tejeev

I believe this log may not be parsing the JSON format messages (I think that's what they are) as separate lines but instead, as continuations of lines like 2021-09-23 05:28:03.322700 I | etcdserver/api/etcdhttp: /health OK (status code 200) even though they do appear on lines of their own.

Yes, lnav can only handle a single log message format for a given file. So, it's probably recognizing the following line and associates that format with this whole file:

2021-09-23 05:27:57.180058 W | etcdserver/api/etcdhttp: /health error; no leader (status code 503)

Then, when lnav sees this line:

{"level":"warn","ts":"2021-09-23T05:28:06.446Z","caller":"rafthttp/peer.go:267","msg":"dropped internal Raft message since sending buffer is full (overloaded network)","message-type":"MsgHeartbeat","local-member-id":"REDACTED","from":"REDACTED","remote-peer-id":"REDACTED","remote-peer-active":false}

It will treat the JSON line as a continuation of the previous line since it doesn't match any of the log message patterns for the previously associated format. You can find out how lnav is parsing a line by pressing the p key to open an overlay that describes the results. If the line is a continuation, the overlay won't appear. Here's an example of what the overlay looks like:

Screen Region 2021-10-01 at 10 40 55

Is it difficult to get the log messages segregated so there is only one type of line in a file? Or, are you stuck with the way things are now. It's a bit of a non-trivial change in lnav to have it work with multiple formats per file.

tstack avatar Oct 01 '21 17:10 tstack

with these particular logs I think I can identify the string that lnav is picking and just go through and remove them from the files, but I can foresee it becoming an issue. I think this is the logging as received from contained logging, so it may show up more often in the future as well. Let's leave this issue open but not worry about it for now. If this becomes more of an issue I'll update the issue and maybe some other solution will come about

Tejeev avatar Oct 01 '21 23:10 Tejeev