plaso
plaso copied to clipboard
WinIIS not parsing log entries
IIS parser failing to parse log entries
IIS is failing to parse millions of log entries in 50gb data set
Command line and arguments:
execution chain thru dftimewolf for log folder processing
/usr/bin/log2timeline.py -q --status_view none --partition all --logfile /x/x/plaso.log --storage-file /tmp/x/x.plaso iislogs/"
Source data:
sample entry which parsing failed:
2020-2-07 16:22:03 x.x.x.x POST /EstimateMyLine estimate=51024&Tab=2200&revision=0 443 scooby.doo x.x.x.x Mozilla/5.0+(Windows+NT+10.0;+Win64;+x64)+AppleWebKit/537.36+(KHTML,+like+Gecko)+Chrome/83.0.4103.116+Safari/537.36+Edg/83.0.478.56 https://www.mysite.com/Retail/51024?Revision=0 200 0 0 22
Plaso version:
plaso - log2timeline version 20211024
Operating system Plaso is running on:
19~20.04.1-Ubuntu SMP
Debug output/tracebacks:
2022-05-03 17:38:33,271 [DEBUG] (MainProcess) PID:1036000
This is the extraction warning for the sample entry above.
*************************** Extraction warning: 303 **************************** Message : unable to parse log line: "#############" at offset: 3769471 Parser chain : winiis Path specification : type: OS, location: /x/x/x.log
The offset seems a bit large as the line does not have that many characters. Also contains no special characters. Full source data contains 180 million entries, only 18 million were parsed.
Thanks for the report @dootyfree - can you confirm the month in the date has no leading 0? So the month is "2" not "02"? I want to make sure this isn't an artifact of the anonymization of the log. I also note the day does have a leading 0.
Hello Onager,
There is a leading 0 for month on date. Possibly anonymization error.
Best Regards,
Thanks @dootyfree - I took a bit more of a look and I can't reproduce the issue. I have a couple of ideas though:
- Are there any extraction warnings saying
missing definition for field...
? Or any other warnings/errors for the file? Something sayingunknown structure
perhaps? - The line you provided includes what looks like a referer field, which is non-default. The parser should still handle this though - could you provide the
#Fields: ...
line from one of the logs where you're getting errors? - Do all the files in the data set have the same format, or is there some pattern? You mentioned only around 10% of the logs were parsed, is there anything that the successful ones have in common the the failures don't? Are they all the same version of IIS, same Fields definition etc.?
Changes were merged, closing issue