goaccess icon indicating copy to clipboard operation
goaccess copied to clipboard

neither --hide-referrer nor --ignore-referrer work

Open yarozar opened this issue 1 year ago • 5 comments

Neither --hide-referrer nor --ignore-referrer work for me. There is no changes in report if I provide or do not provide these params - "Referrer URLs" and "Referrering Sites" panels stay the same.

Tried all the possible combinations. 🤷‍♂️

Can you confirm there is no degradation in recent versions of goaccess?

Thanks!

yarozar avatar Sep 22 '23 12:09 yarozar

--ignore-referer should work as expected in v1.7.2

234090502-3c3911ed-70ca-4cbd-aa82-3989130246b6

allinurl avatar Sep 23 '23 00:09 allinurl

👍 did not realize you need to provide full path, was using it like 'google.com*' without succes. Thanks for clarification!

Also noticed that wild card * is treated like regexp .+ (instead of regexp .*): '*google.com*' does not ignore 'http://google.com' but only if there is at least one more character at the end, for example: 'http://google.com/'. I would expect '*' to cover 'no character' use-case as well ... but this one has easy walkaround - to define several rules instead of one.

Thanks for help, fantastic tool!

yarozar avatar Sep 23 '23 07:09 yarozar

I'm trying to ignore self referrals in sub.domain.com logs and I've tried with --ignore-referrers=*sub.domain.com* and --ignore-referrers=sub.domain.com* and neither work. Only --ignore-referrers=*ub.domain.co* works. (I'm using goaccess 1.7).

niol avatar Feb 14 '24 08:02 niol

@niol, could you drop a few lines from your access log that you're having trouble capturing? I can take a look. Thanks!

allinurl avatar Feb 14 '24 23:02 allinurl

niol@volyova:~$ cat access.log
1.2.3.4 - - [15/Feb/2024:00:01:40 +0100] "GET /~niol/repositories.git/lazygal/commit/lazygal.1.xml?h=0.7.3&id=aed6c7e644dc4c47dc49fea08877f610d56fa9ae HTTP/1.1" 200 7682 "https://sub.domain.com/~niol/repositories.git/lazygal/commit/lazygal.1.xml?h=0.7.3&id=aed6c7e644dc4c47dc49fea08877f610d56fa9ae" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
1.2.3.4 - - [15/Feb/2024:00:02:29 +0100] "GET /~niol/repositories.git/deejayd/diff/debian/changelog?h=DEEJAYD-DEBIAN_0.10.0-4 HTTP/1.1" 200 7535 "https://sub.domain.com/~niol/repositories.git/deejayd/diff/debian/changelog?h=DEEJAYD-DEBIAN_0.10.0-4" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
1.2.3.4 - - [15/Feb/2024:00:03:15 +0100] "GET /~niol/repositories.git/lazygal/log/ChangeLog?id=abf36d82f62323d7a3ec798a950da6945dc0d39a HTTP/1.1" 200 8201 "https://sub.domain.com/~niol/repositories.git/lazygal/log/ChangeLog?id=abf36d82f62323d7a3ec798a950da6945dc0d39a" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
niol@volyova:~$ goaccess --ignore-referrer=sub.domain.com* -o json access.log | jq '.referrers.data[] | .data'
 [PARSING access.log] {0} @ {0/s}
"https://sub.domain.com/~niol/repositories.git/lazygal/commit/lazygal.1.xml?h=0.7.3&id=aed6c7e644dc4c47dc49fea08877f610d56fa9ae"
"https://sub.domain.com/~niol/repositories.git/deejayd/diff/debian/changelog?h=DEEJAYD-DEBIAN_0.10.0-4"
"https://sub.domain.com/~niol/repositories.git/lazygal/log/ChangeLog?id=abf36d82f62323d7a3ec798a950da6945dc0d39a"
niol@volyova:~$ goaccess --ignore-referrer=*sub.domain.com* -o json access.log | jq '.referrers.data[] | .data'
 [PARSING access.log] {0} @ {0/s}
"https://sub.domain.com/~niol/repositories.git/lazygal/commit/lazygal.1.xml?h=0.7.3&id=aed6c7e644dc4c47dc49fea08877f610d56fa9ae"
"https://sub.domain.com/~niol/repositories.git/deejayd/diff/debian/changelog?h=DEEJAYD-DEBIAN_0.10.0-4"
"https://sub.domain.com/~niol/repositories.git/lazygal/log/ChangeLog?id=abf36d82f62323d7a3ec798a950da6945dc0d39a"
niol@volyova:~$ 

but this is not consistent with

#include <stdio.h>
#include <string.h>


/* String matching where one string contains wildcard characters.
 *
 * If no match found, 1 is returned.
 * If match found, 0 is returned. */
static int
wc_match (const char *wc, char *str) {
  while (*wc && *str) {
    if (*wc == '*') {
      while (*wc && *wc == '*')
        wc++;
      if (!*wc)
        return 1;

      while (*str && *str != *wc)
        str++;
    } else if (*wc == '?' || *wc == *str) {
      wc++;
      str++;
    } else {
      break;
    }
  }
  if (!*wc && !*str)
    return 1;
  return 0;
}


void match(const char* pattern, const char* str) {
    printf("%s match %s ? %s\n", pattern, str,
                                 wc_match(pattern, (char*) str)? "yes":"no");
}


int main(int argc, char** argv){
    match("*sub.domain.com*", "http://sub.domain.com/specific/page");
}

which outputs:

*sub.domain.com* match http://sub.domain.com/specific/page ? yes

niol avatar Feb 15 '24 11:02 niol

Same issue here. --ignore-referrer=*iki.evilazrael.d* works, but not --ignore-referrer=wiki.evilazrael.de . Some log entries:

109.91.147.65 - - [04/Apr/2024:22:05:20 +0200] "GET /en/artillery3d-sidewinder-x4-plus HTTP/1.1" 304 - "https://wiki.evilazrael.de/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
109.91.147.65 - - [04/Apr/2024:22:05:20 +0200] "POST /graphql HTTP/1.1" 200 728 "https://wiki.evilazrael.de/en/artillery3d-sidewinder-x4-plus" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
109.91.147.65 - - [04/Apr/2024:22:54:35 +0200] "GET /_assets/js/app.js?1706490487 HTTP/1.1" 200 578258 "https://wiki.evilazrael.de/" "Mozilla/5.0 (Linux; Android 13; lenovo x606fa) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.149 Safari/537.36 OPR/81.3.4292.78688"
109.91.147.65 - - [04/Apr/2024:22:54:36 +0200] "GET /_assets/css/mdi.ad9d067665721699a5d0.css HTTP/1.1" 200 43339 "https://wiki.evilazrael.de/" "Mozilla/5.0 (Linux; Android 13; lenovo x606fa) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.149 Safari/537.36 OPR/81.3.4292.78688"
109.91.147.65 - - [04/Apr/2024:22:54:36 +0200] "GET /_assets/js/vendor.js?1706490487 HTTP/1.1" 200 822355 "https://wiki.evilazrael.de/" "Mozilla/5.0 (Linux; Android 13; lenovo x606fa) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.149 Safari/537.36 OPR/81.3.4292.78688"
109.91.147.65 - - [04/Apr/2024:22:54:36 +0200] "GET /_assets/js/editor.js?1706490487 HTTP/1.1" 200 134129 "https://wiki.evilazrael.de/" "Mozilla/5.0 (Linux; Android 13; lenovo x606fa) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.149 Safari/537.36 OPR/81.3.4292.78688"
109.91.147.65 - - [04/Apr/2024:22:54:36 +0200] "POST /graphql HTTP/1.1" 200 3839 "https://wiki.evilazrael.de/" "Mozilla/5.0 (Linux; Android 13; lenovo x606fa) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.149 Safari/537.36 OPR/81.3.4292.78688"

2nd edit: Running latest image from docker: allinurl/goaccess:latest@sha256:54d45bbf7b97f70735a0dc46d697942277488167040446a902e4838ee8a83770

eazrael avatar Apr 05 '24 21:04 eazrael

@eazrael, ensure to include the "https://" prefix, for example, "--ignore-referrer=https://wiki.evilazrael.de". Let me know if that helps.

allinurl avatar Apr 05 '24 21:04 allinurl

Negative. All entries are there. --ignore-referrer=https://wiki.evilazrael.d* works, a slight improvement

eazrael avatar Apr 05 '24 22:04 eazrael

Even the more restrictive filter '--ignore-referrer=https://wiki.evilazrael.de/*' '--ignore-referrer=http://wiki.evilazrael.de/*' '--ignore-referrer=https://wiki.evilazrael.de' '--ignore-referrer=http://wiki.evilazrael.de' does not catch all the entries "https://wiki.evilazrael.d/*" catches. I can send you or publish the list of referrer URLs if that helps. There are only 84.

Edit: Github ate the spaces between the parameters.

eazrael avatar Apr 05 '24 22:04 eazrael

@eazrael, yeah, feel free to go ahead and post the rest of them. Not sure I'm seeing the problem here to be honest. As for your original post:

Same issue here. The --ignore-referrer=iki.evilazrael.d thing works, but --ignore-referrer=wiki.evilazrael.de doesn't. Here are some log entries:

# goaccess access.log --log-format=COMBINED --ignore-referrer=https://wiki.evilazrael.de

the trick is you need to include the slash at the end to match the entries you posted. So the full command should be --ignore-referrer=https://wiki.evilazrael.de/.

# goaccess access.log --log-format=COMBINED --ignore-referrer=https://wiki.evilazrael.de/

will display:

2024-04-05-202803_406x154_scrot

allinurl avatar Apr 06 '24 01:04 allinurl

I think the problem is the man page. It says that wildcards are supported, not that you need to use them. I think the most import use case for --ignore-referrer is to ignore all intra-site referrer and this can be in many or most cases simply achieved by filtering by the host name.

Just created test program simillar to niol's. My expectation is that this function behaves as in globbing where asterisk matches zero or more characters,, but in the current implementation it is one or more characters.

 ./pattern_tester "*//wiki.evilazrael.de/*" < /tmp/urls.txt
no  'http://wiki.evilazrael.de/' matches '*//wiki.evilazrael.de/*'
yes 'http://wiki.evilazrael.de/misc/favicon.ico' matches '*//wiki.evilazrael.de/*'
no  'https://wiki.evilazrael.de' matches '*//wiki.evilazrael.de/*'
no  'https://wiki.evilazrael.de/' matches '*//wiki.evilazrael.de/*'

eazrael avatar Apr 06 '24 12:04 eazrael

@eazrael, I've implemented the changes you suggested. Now, you can specify the host directly, for example: --ignore-referrer=wiki.evilazrael.de. Keep in mind that wildcard characters are still supported as before.

Please feel free to build from development and let me know if it resolves the issue on your side. I'll include this in the upcoming release.

allinurl avatar Apr 07 '24 19:04 allinurl

I’m having a hard time with this also. I even cannot make sense of when the .conf file is read. I make changes and they seem to be ineffective…. sometimes. Using v1.9.1.

ash34 avatar Apr 13 '24 06:04 ash34

@ash34, have you updated to version 1.9.2? The issue should be resolved in that version. If you have updated and the problem still persists, please share some sample lines from your log along with the command you are using, so I can reproduce it on my end.

allinurl avatar Apr 13 '24 19:04 allinurl

Thanks a lot, with this change, it behaves as expected.

niol avatar Apr 15 '24 06:04 niol

I am using 1.9.2. No matter what I put in the conf file I cannot get these to work:

ignore-referrer dashboard.weglot.com ignore-referrer xlr8racing.net ignore-referrer ghost-rider

The strange thing is I was previously using version 1.7 and I could get this to work:

ignore-referrer *xlr8racing.ne*

At the time no other combination would work, only that and it now does not work and neither does anything else in 1.9.2.

ash34 avatar Apr 16 '24 09:04 ash34

Ignore my previous. I had persistence enabled. Removing persistence proves that 1.9.2 works.

ash34 avatar Apr 16 '24 16:04 ash34

@ash34 thanks for the update!

allinurl avatar Apr 16 '24 19:04 allinurl