matomo-log-analytics icon indicating copy to clipboard operation
matomo-log-analytics copied to clipboard

Use custom log file format: example (to be added as unit test)

Open tfrdidi opened this issue 9 years ago • 7 comments

I have log files with the following format:

date time s-sitename s-computername s-ip cs-method cs-uri-stem cs-uri-query s-port cs-username c-ip cs-version cs(User-Agent) cs(Cookie) cs(Referer) cs-host sc-status sc-substatus sc-win32-status sc-bytes cs-bytes time-taken

They are standard IIS-Logs, but I have to import them without certain fields. Therefore I used

--log-format-regex="(?P<date>.*?) \S+ \S+ \S+ \S+ \S+ (?P<path>/\S*) (?P<query_string>\S*) \S+ \S+ (?P<ip>[\w*.:-]*) \S+ (?P<user_agent>".*?"|\S*) \S+ (?P<referrer>\S+) (?P<host>\S+) (?P<status>\d+) \S+ \S+ (?P<length>\S+) \S+ (?P<generation_time_milli>[.\d]+)"

But it did not find one line matching.

Here is one example log line 2015-01-01 21:51:58 W3SVC4 S01 9.9.9.9 GET /Content/index.aspx - 80 testuser 9.9.9.9 HTTP/1.1 Mozilla/5.0+(compatible;+MSIE+10.0;+Windows+NT+6.1;+WOW64;+Trident/6.0) - http://testsite.de/ testsite.de 200 0 0 30647 5673 2851

Any idea what i am doing wrong?

tfrdidi avatar Jul 20 '16 13:07 tfrdidi

Try using more specific patterns. The .* used for your date pattern does match anything as it is not limited. maybe try something more specific like (?P<date>[0-9-]+ [0-9:]+)

sgiehl avatar Jul 20 '16 13:07 sgiehl

Thanks for your fast reply. I tried it with

--log-format-regex="(?P<date>\d+[-\d+]+ [\d+:]+) \S+ \S+ \S+ \S+ \S+ (?P<path>/\S*) (?P<query_string>\S*) \S+ \S+ (?P<ip>[\w*.:-]*) \S+ (?P<user_agent>".*?"|\S*) \S+ (?P<referrer>\S+) (?P<host>\S+) (?P<status>\d+) \S+ \S+ (?P<length>\S+) \S+ \S+"

and several variations, but no change in the result. Do you have any idea how to get more information what is going under the hood? --debug is not very helpful.

tfrdidi avatar Jul 20 '16 16:07 tfrdidi

You can try to use the regex to search in the log file on command line. If there are results it should work. If not you need to adjust the regex until it matches

sgiehl avatar Jul 20 '16 16:07 sgiehl

How could I search with this regex, which is specific to this python script in the command line?

tfrdidi avatar Jul 20 '16 16:07 tfrdidi

Thanks for the help! I have solved it using the following regex: --log-format-regex="(?P<date>\S+ \S+) \S+ \S+ \S+ \S+ (?P<path>\S+) (?P<query_string>\S*) \S+ \S+ (?P<ip>\S+) \S+ (?P<user_agent>".*?"|\S*) \S+ (?P<referrer>\S+) (?P<host>\S+) (?P<status>\S+) \S+ \S+ (?P<length>\S+) \S+ (?P<generation_time_milli>[.\d]+)". This ticket could be closed now ;-)

tfrdidi avatar Jan 26 '17 09:01 tfrdidi

Thanks for posting. We'll leave this ticket opened as it would be nice to:

  1. Add a unit test with your example log + command
  2. Maybe add a link on the doc to the test (or also repeat this example in the doc).

For sure it will help many people trying to write custom log imports

mattab avatar Jun 20 '17 01:06 mattab

Any updates on this? I would like to write my own custom log import but there is no proper documentation for this.

gpanagiotidis avatar Jan 16 '19 12:01 gpanagiotidis