goaccess
goaccess copied to clipboard
neither --hide-referrer nor --ignore-referrer work
Neither --hide-referrer
nor --ignore-referrer
work for me. There is no changes in report if I provide or do not provide these params - "Referrer URLs" and "Referrering Sites" panels stay the same.
Tried all the possible combinations. 🤷♂️
Can you confirm there is no degradation in recent versions of goaccess?
Thanks!
--ignore-referer
should work as expected in v1.7.2
👍 did not realize you need to provide full path, was using it like 'google.com*' without succes. Thanks for clarification!
Also noticed that wild card *
is treated like regexp .+
(instead of regexp .*
):
'*google.com*'
does not ignore 'http://google.com'
but only if there is at least one more character at the end, for example: 'http://google.com/'
. I would expect '*'
to cover 'no character' use-case as well ... but this one has easy walkaround - to define several rules instead of one.
Thanks for help, fantastic tool!
I'm trying to ignore self referrals in sub.domain.com
logs and I've tried with
--ignore-referrers=*sub.domain.com*
and
--ignore-referrers=sub.domain.com*
and neither work.
Only
--ignore-referrers=*ub.domain.co*
works.
(I'm using goaccess 1.7).
@niol, could you drop a few lines from your access log that you're having trouble capturing? I can take a look. Thanks!
niol@volyova:~$ cat access.log
1.2.3.4 - - [15/Feb/2024:00:01:40 +0100] "GET /~niol/repositories.git/lazygal/commit/lazygal.1.xml?h=0.7.3&id=aed6c7e644dc4c47dc49fea08877f610d56fa9ae HTTP/1.1" 200 7682 "https://sub.domain.com/~niol/repositories.git/lazygal/commit/lazygal.1.xml?h=0.7.3&id=aed6c7e644dc4c47dc49fea08877f610d56fa9ae" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
1.2.3.4 - - [15/Feb/2024:00:02:29 +0100] "GET /~niol/repositories.git/deejayd/diff/debian/changelog?h=DEEJAYD-DEBIAN_0.10.0-4 HTTP/1.1" 200 7535 "https://sub.domain.com/~niol/repositories.git/deejayd/diff/debian/changelog?h=DEEJAYD-DEBIAN_0.10.0-4" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
1.2.3.4 - - [15/Feb/2024:00:03:15 +0100] "GET /~niol/repositories.git/lazygal/log/ChangeLog?id=abf36d82f62323d7a3ec798a950da6945dc0d39a HTTP/1.1" 200 8201 "https://sub.domain.com/~niol/repositories.git/lazygal/log/ChangeLog?id=abf36d82f62323d7a3ec798a950da6945dc0d39a" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/114.0.0.0 Safari/537.36"
niol@volyova:~$ goaccess --ignore-referrer=sub.domain.com* -o json access.log | jq '.referrers.data[] | .data'
[PARSING access.log] {0} @ {0/s}
"https://sub.domain.com/~niol/repositories.git/lazygal/commit/lazygal.1.xml?h=0.7.3&id=aed6c7e644dc4c47dc49fea08877f610d56fa9ae"
"https://sub.domain.com/~niol/repositories.git/deejayd/diff/debian/changelog?h=DEEJAYD-DEBIAN_0.10.0-4"
"https://sub.domain.com/~niol/repositories.git/lazygal/log/ChangeLog?id=abf36d82f62323d7a3ec798a950da6945dc0d39a"
niol@volyova:~$ goaccess --ignore-referrer=*sub.domain.com* -o json access.log | jq '.referrers.data[] | .data'
[PARSING access.log] {0} @ {0/s}
"https://sub.domain.com/~niol/repositories.git/lazygal/commit/lazygal.1.xml?h=0.7.3&id=aed6c7e644dc4c47dc49fea08877f610d56fa9ae"
"https://sub.domain.com/~niol/repositories.git/deejayd/diff/debian/changelog?h=DEEJAYD-DEBIAN_0.10.0-4"
"https://sub.domain.com/~niol/repositories.git/lazygal/log/ChangeLog?id=abf36d82f62323d7a3ec798a950da6945dc0d39a"
niol@volyova:~$
but this is not consistent with
#include <stdio.h>
#include <string.h>
/* String matching where one string contains wildcard characters.
*
* If no match found, 1 is returned.
* If match found, 0 is returned. */
static int
wc_match (const char *wc, char *str) {
while (*wc && *str) {
if (*wc == '*') {
while (*wc && *wc == '*')
wc++;
if (!*wc)
return 1;
while (*str && *str != *wc)
str++;
} else if (*wc == '?' || *wc == *str) {
wc++;
str++;
} else {
break;
}
}
if (!*wc && !*str)
return 1;
return 0;
}
void match(const char* pattern, const char* str) {
printf("%s match %s ? %s\n", pattern, str,
wc_match(pattern, (char*) str)? "yes":"no");
}
int main(int argc, char** argv){
match("*sub.domain.com*", "http://sub.domain.com/specific/page");
}
which outputs:
*sub.domain.com* match http://sub.domain.com/specific/page ? yes
Same issue here. --ignore-referrer=*iki.evilazrael.d* works, but not --ignore-referrer=wiki.evilazrael.de . Some log entries:
109.91.147.65 - - [04/Apr/2024:22:05:20 +0200] "GET /en/artillery3d-sidewinder-x4-plus HTTP/1.1" 304 - "https://wiki.evilazrael.de/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
109.91.147.65 - - [04/Apr/2024:22:05:20 +0200] "POST /graphql HTTP/1.1" 200 728 "https://wiki.evilazrael.de/en/artillery3d-sidewinder-x4-plus" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36"
109.91.147.65 - - [04/Apr/2024:22:54:35 +0200] "GET /_assets/js/app.js?1706490487 HTTP/1.1" 200 578258 "https://wiki.evilazrael.de/" "Mozilla/5.0 (Linux; Android 13; lenovo x606fa) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.149 Safari/537.36 OPR/81.3.4292.78688"
109.91.147.65 - - [04/Apr/2024:22:54:36 +0200] "GET /_assets/css/mdi.ad9d067665721699a5d0.css HTTP/1.1" 200 43339 "https://wiki.evilazrael.de/" "Mozilla/5.0 (Linux; Android 13; lenovo x606fa) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.149 Safari/537.36 OPR/81.3.4292.78688"
109.91.147.65 - - [04/Apr/2024:22:54:36 +0200] "GET /_assets/js/vendor.js?1706490487 HTTP/1.1" 200 822355 "https://wiki.evilazrael.de/" "Mozilla/5.0 (Linux; Android 13; lenovo x606fa) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.149 Safari/537.36 OPR/81.3.4292.78688"
109.91.147.65 - - [04/Apr/2024:22:54:36 +0200] "GET /_assets/js/editor.js?1706490487 HTTP/1.1" 200 134129 "https://wiki.evilazrael.de/" "Mozilla/5.0 (Linux; Android 13; lenovo x606fa) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.149 Safari/537.36 OPR/81.3.4292.78688"
109.91.147.65 - - [04/Apr/2024:22:54:36 +0200] "POST /graphql HTTP/1.1" 200 3839 "https://wiki.evilazrael.de/" "Mozilla/5.0 (Linux; Android 13; lenovo x606fa) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.6261.149 Safari/537.36 OPR/81.3.4292.78688"
2nd edit: Running latest image from docker: allinurl/goaccess:latest@sha256:54d45bbf7b97f70735a0dc46d697942277488167040446a902e4838ee8a83770
@eazrael, ensure to include the "https://" prefix, for example, "--ignore-referrer=https://wiki.evilazrael.de". Let me know if that helps.
Negative. All entries are there. --ignore-referrer=https://wiki.evilazrael.d* works, a slight improvement
Even the more restrictive filter '--ignore-referrer=https://wiki.evilazrael.de/*' '--ignore-referrer=http://wiki.evilazrael.de/*' '--ignore-referrer=https://wiki.evilazrael.de' '--ignore-referrer=http://wiki.evilazrael.de'
does not catch all the entries "https://wiki.evilazrael.d/*" catches. I can send you or publish the list of referrer URLs if that helps. There are only 84.
Edit: Github ate the spaces between the parameters.
@eazrael, yeah, feel free to go ahead and post the rest of them. Not sure I'm seeing the problem here to be honest. As for your original post:
Same issue here. The --ignore-referrer=iki.evilazrael.d thing works, but --ignore-referrer=wiki.evilazrael.de doesn't. Here are some log entries:
# goaccess access.log --log-format=COMBINED --ignore-referrer=https://wiki.evilazrael.de
the trick is you need to include the slash at the end to match the entries you posted. So the full command should be --ignore-referrer=https://wiki.evilazrael.de/
.
# goaccess access.log --log-format=COMBINED --ignore-referrer=https://wiki.evilazrael.de/
will display:
I think the problem is the man page. It says that wildcards are supported, not that you need to use them. I think the most import use case for --ignore-referrer is to ignore all intra-site referrer and this can be in many or most cases simply achieved by filtering by the host name.
Just created test program simillar to niol's. My expectation is that this function behaves as in globbing where asterisk matches zero or more characters,, but in the current implementation it is one or more characters.
./pattern_tester "*//wiki.evilazrael.de/*" < /tmp/urls.txt
no 'http://wiki.evilazrael.de/' matches '*//wiki.evilazrael.de/*'
yes 'http://wiki.evilazrael.de/misc/favicon.ico' matches '*//wiki.evilazrael.de/*'
no 'https://wiki.evilazrael.de' matches '*//wiki.evilazrael.de/*'
no 'https://wiki.evilazrael.de/' matches '*//wiki.evilazrael.de/*'
@eazrael, I've implemented the changes you suggested. Now, you can specify the host directly, for example: --ignore-referrer=wiki.evilazrael.de
. Keep in mind that wildcard characters are still supported as before.
Please feel free to build from development and let me know if it resolves the issue on your side. I'll include this in the upcoming release.
I’m having a hard time with this also. I even cannot make sense of when the .conf file is read. I make changes and they seem to be ineffective…. sometimes. Using v1.9.1.
@ash34, have you updated to version 1.9.2? The issue should be resolved in that version. If you have updated and the problem still persists, please share some sample lines from your log along with the command you are using, so I can reproduce it on my end.
Thanks a lot, with this change, it behaves as expected.
I am using 1.9.2. No matter what I put in the conf file I cannot get these to work:
ignore-referrer dashboard.weglot.com
ignore-referrer xlr8racing.net
ignore-referrer ghost-rider
The strange thing is I was previously using version 1.7 and I could get this to work:
ignore-referrer *xlr8racing.ne*
At the time no other combination would work, only that and it now does not work and neither does anything else in 1.9.2.
Ignore my previous. I had persistence enabled. Removing persistence proves that 1.9.2 works.
@ash34 thanks for the update!