logtools icon indicating copy to clipboard operation
logtools copied to clipboard

logparser fails to parse an apache-combined log where userid contains a space

Open bkc opened this issue 13 years ago • 6 comments

Hi,

Great package. I have hit one small problem:

using logtools==0.8

got this exception:

  File "/home/bkc/Python_Environments/weblog/lib/python2.7/site-packages/logtools/_parse.py", line 100, in logparse
    yield key_func(line)
  File "/home/bkc/Python_Environments/weblog/lib/python2.7/site-packages/logtools/parsers.py", line 47, in multiindex_getter
    data = parser(line.strip())
  File "/home/bkc/Python_Environments/weblog/lib/python2.7/site-packages/logtools/parsers.py", line 69, in __call__
    return self.parse(line)
  File "/home/bkc/Python_Environments/weblog/lib/python2.7/site-packages/logtools/parsers.py", line 183, in parse
    raise ValueError("Could not parse log line: '%s'" % logline)
  ValueError: Could not parse log line: '122.118.199.146 - \"car5941 \" [09/Apr/2012:14:35:48 -0400] "GET /favicon.ico HTTP/1.1" 200 125 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.151 Safari/535.19"'

Editing the input file and removing the space character, e.g. "car5941 " to "car5941" fixes the problem.

bkc avatar Apr 09 '12 19:04 bkc

hey man, thanks for feedback - if u can attach example text file with this line i'll try to debug through sometime in the next few days

Adam

On Mon, Apr 9, 2012 at 12:05 PM, bkc < [email protected]

wrote:

Hi,

Great package. I have hit one small problem:

using logtools==0.8

got this exception:

     File

"/home/bkc/Python_Environments/weblog/lib/python2.7/site-packages/logtools/_parse.py", line 100, in logparse yield key_func(line) File "/home/bkc/Python_Environments/weblog/lib/python2.7/site-packages/logtools/parsers.py", line 47, in multiindex_getter data = parser(line.strip()) File "/home/bkc/Python_Environments/weblog/lib/python2.7/site-packages/logtools/parsers.py", line 69, in call return self.parse(line) File "/home/bkc/Python_Environments/weblog/lib/python2.7/site-packages/logtools/parsers.py", line 183, in parse raise ValueError("Could not parse log line: '%s'" % logline) ValueError: Could not parse log line: '122.118.199.146 - "car5941 " [09/Apr/2012:14:35:48 -0400] "GET /favicon.ico HTTP/1.1" 200 125 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.151 Safari/535.19"'

Editing the input file and removing the space character, e.g. "car5941 " to "car5941" fixes the problem.


Reply to this email directly or view it on GitHub: https://github.com/adamhadani/logtools/issues/4

adamhadani avatar Apr 09 '12 19:04 adamhadani

I can't find a way to attach a file to the git issue.

I have attached a file to this note.

Brad Clements, [email protected] (315)268-1000 Jabber/XMPP: [email protected]

bkc avatar Apr 09 '12 19:04 bkc

Brad, I take it you tried using the logparser utility / api with this line. Can you supply the command-line you tried, or otherwise the parser / format string you're using for this?

adamhadani avatar Apr 09 '12 22:04 adamhadani

Hi,

Sorry I did not include this information before:

the command line is:

(weblog)[bkc@server5 scripts]$ cat /tmp/test.log | logparse --parser AccessLog --format '%h %l %u %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"' -f1,2

2012-04-09 19:01:16,903 - root - ERROR - Could not match fields for parsed line: 69.18.99.146 - "car5941 " [09/Apr/2012:14:35:48 -0400] "GET /favicon.ico HTTP/1.1" 200 125 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.151 Safari/535.19"

Traceback (most recent call last): File "/home/bkc/Python_Environments/weblog/bin/logparse", line 9, in load_entry_point('logtools==0.8', 'console_scripts', 'logparse')() File "/home/bkc/Python_Environments/weblog/lib/python2.7/site-packages/logtools/_parse.py", line 112, in logparse_main for row in logparse(options, args, fh=sys.stdin): File "/home/bkc/Python_Environments/weblog/lib/python2.7/site-packages/logtools/_parse.py", line 100, in logparse yield key_func(line) File "/home/bkc/Python_Environments/weblog/lib/python2.7/site-packages/logtools/parsers.py", line 47, in multiindex_getter data = parser(line.strip()) File "/home/bkc/Python_Environments/weblog/lib/python2.7/site-packages/logtools/parsers.py", line 69, in call return self.parse(line) File "/home/bkc/Python_Environments/weblog/lib/python2.7/site-packages/logtools/parsers.py", line 183, in parse raise ValueError("Could not parse log line: '%s'" % logline) ValueError: Could not parse log line: '69.18.99.146 - "car5941 " [09/Apr/2012:14:35:48 -0400] "GET /favicon.ico HTTP/1.1" 200 125 "-" "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.151 Safari/535.19"'

On 04/09/2012 06:13 PM, Adam Ever-Hadani wrote:

Brad, I take it you tried using the logparser utility / api with this line. Can you supply the command-line you tried, or otherwise the parser / format string you're using for this?


Reply to this email directly or view it on GitHub: https://github.com/adamhadani/logtools/issues/4#issuecomment-5035105

Brad Clements, [email protected] (315)268-1000 Jabber/XMPP: [email protected]

bkc avatar Apr 09 '12 23:04 bkc

Hey Brad, sorry for taking so long to respond. I played around with the script and it looks like if removing the backslashes on quotes and specifying them in the format (e.g "%u" instead of %u) works. Question is - are you specifically trying to use quoted usernames that might have spaces in them that u'd like to differentiate?

here is what worked for me (after as said, changed original logs from " to " where appropriate ):

cat data/buglog.txt | logparse --parser AccessLog --format '%h %l "%u" %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"' -f1,2,3 122.118.199.146 - car5941 122.118.199.146 - car5941

adamhadani avatar May 01 '12 20:05 adamhadani

Hi,

Thanks for the response.

The field we are logging doesn't contain quote marks in the values to be logged, so I have no idea why apache is adding quotes around that field when recording them in the log. It's also perplexing that it then adds backslashes to the quotes it's added to the field.

We do not expect the %u column to contain spaces. I suspect that one user accidentally typed in a trailing space when they logged in, the authentication routine must have stripped that space when checking the password, but the cookie was set to the value with the trailing space.

does that help?

On 05/01/2012 04:46 PM, Adam Ever-Hadani wrote:

Hey Brad, sorry for taking so long to respond. I played around with the script and it looks like if removing the backslashes on quotes and specifying them in the format (e.g "%u" instead of %u) works. Question is - are you specifically trying to use quoted usernames that might have spaces in them that u'd like to differentiate?

here is what worked for me (after as said, changed original logs from " to " where appropriate ):

cat data/buglog.txt | logparse --parser AccessLog --format '%h %l "%u" %t "%r" %>s %b "%{Referer}i" "%{User-agent}i"' -f1,2,3 122.118.199.146 - car5941 122.118.199.146 - car5941


Reply to this email directly or view it on GitHub: https://github.com/adamhadani/logtools/issues/4#issuecomment-5447700

Brad Clements, [email protected] (315)268-1000 Jabber/XMPP: [email protected]

bkc avatar May 01 '12 21:05 bkc