lnav
lnav copied to clipboard
Histogram isn't showing all current results
lnav version lnav 0.10.0-98-gb3b8ed7 2146 latest/edge timothyshanestack -
Describe the bug
Histogram isn't showing all the log lines in the current view
To Reproduce Steps to reproduce the behavior: This seems to happen more often when I have multiple files loaded. I apply various filters and then go to view the histogram and it's got almost nothing
Looks like it's only reporting the warnings? Maybe something changed in the hotkeys or I just accidentally changed some config?
2021-09-23 05:27:57.180058 W | etcdserver/api/etcdhttp: /health error; no leader (status code 503)
Actually, I think something is wrong with the filters. These return numbers are all hokey:
I took out the regex and the returns for the filter went up to 40k and the search results had ~800. Not sure how I can have less returns of part of the filter string.
Are these logs live (new lines being added while being viewed)?
Actually, I think something is wrong with the filters. These return numbers are all hokey:
An OUT filter will remove lines that are matched by an IN filter. Can you do the search with the filters disabled and see how many search hits there are?
Actually, I think something is wrong with the filters. These return numbers are all hokey:
Ah, the count in the filter UI is the number of lines and not the number of times the text was matched. So, if the text was found by a search, that would result in a single hit, but the filter UI would report the number of lines in the message where the text was found. Hmm, I need to either change the reporting to match the number of messages or change "hits" to "lines" in the UI.
These logs are not live. It's not the out but the IN
that bothers me. Turning off the OUT
doesn't change the number of lines in the view as none of the IN
's in that example also matched the OUT
s (they are mutually exclusive and the OUT
was only there as a leftover from a previous stage of my log diving)
I captured another image showing many lines that did not match the IN
in the logs, but I removed the image because it had some customer-specific information that while not risky (unique container IDs), I'm not sure the customer would appreciate being public even if it's not known who's logs they are.
Just to be on the safe side, I've instead pasted below screen caps of the filters in place with the numbers of hits on a search for "took too long" followed by some of the log lines that showed up with those filters (thanks btw, mark and copy are working great for me now when diving customer logs).
Regarding the hit count vs the search hit count, what concerns me is that the latter has fewer hits than the first where I would expect them to be equal or the latter be equal or greater if there are more hits per line (with the understanding that the filter UI reports lines with hits and the search may report either number of matches total or lines with hits).
I think I may have figured it out though. I believe this log may not be parsing the JSON format messages (I think that's what they are) as separate lines but instead, as continuations of lines like 2021-09-23 05:28:03.322700 I | etcdserver/api/etcdhttp: /health OK (status code 200)
even though they do appear on lines of their own.
It's kind of complicated by the fact that:
- The histogram appears to confirm this with 10 lines (specifical ones in the format mentioned above) when I filter for
\d\d\.\d+s
but the log view shows man many other lines - But when I filter by
took too long
the histogram seems to count all those other lines, or at least counts a lot of lines in the JSON format, even though thetook too long
and\d\d\.\d+s
strings mostly occur together.
2021-09-23 05:27:57.180058 W | etcdserver/api/etcdhttp: /health error; no leader (status code 503)
{"level":"warn","ts":"2021-09-23T05:28:06.446Z","caller":"rafthttp/peer.go:267","msg":"dropped internal Raft message since sending buffer is full (overloaded network)","message-type":"MsgHeartbeat","local-member-id":"REDACTED","from":"REDACTED","remote-peer-id":"REDACTED","remote-peer-active":false}
2021-09-23 05:28:03.322700 I | etcdserver/api/etcdhttp: /health OK (status code 200)
{"level":"warn","ts":"2021-09-23T05:28:04.275Z","caller":"etcdserver/util.go:163","msg":"apply request took too long","took":"2.00004676s","expected-duration":"100ms","prefix":"read-only range ","request":"key:\"/registry/health\" ","response":"","error":"context deadline exceeded"}
I believe this log may not be parsing the JSON format messages (I think that's what they are) as separate lines but instead, as continuations of lines like 2021-09-23 05:28:03.322700 I | etcdserver/api/etcdhttp: /health OK (status code 200) even though they do appear on lines of their own.
Yes, lnav can only handle a single log message format for a given file. So, it's probably recognizing the following line and associates that format with this whole file:
2021-09-23 05:27:57.180058 W | etcdserver/api/etcdhttp: /health error; no leader (status code 503)
Then, when lnav sees this line:
{"level":"warn","ts":"2021-09-23T05:28:06.446Z","caller":"rafthttp/peer.go:267","msg":"dropped internal Raft message since sending buffer is full (overloaded network)","message-type":"MsgHeartbeat","local-member-id":"REDACTED","from":"REDACTED","remote-peer-id":"REDACTED","remote-peer-active":false}
It will treat the JSON line as a continuation of the previous line since it doesn't match any of the log message patterns for the previously associated format. You can find out how lnav is parsing a line by pressing the p
key to open an overlay that describes the results. If the line is a continuation, the overlay won't appear. Here's an example of what the overlay looks like:
Is it difficult to get the log messages segregated so there is only one type of line in a file? Or, are you stuck with the way things are now. It's a bit of a non-trivial change in lnav to have it work with multiple formats per file.
with these particular logs I think I can identify the string that lnav is picking and just go through and remove them from the files, but I can foresee it becoming an issue. I think this is the logging as received from contained logging, so it may show up more often in the future as well. Let's leave this issue open but not worry about it for now. If this becomes more of an issue I'll update the issue and maybe some other solution will come about