AdNauseam
AdNauseam copied to clipboard
Ads hidden but not collected
These need to be addressed (note: this is a good exercise for new developers wishing to understand how ad-parsing works):
- [x] nytimes.com
- [x] thepiratebay.org
A checklist of ads collecting for Chinese ads
- [x] 163.com
- [x] sohu.com
- [x] sina.com
- [ ] zol.com.cn
- [ ] weibo.com
- [ ] qq.com (background ads)
- [ ] baidu.com (text ads)
hk popular site:
- [ ] wmoov.com
thepiratebay.org finished with #363
ads on nytimes.com are not captured, because the script generating them has been blocked by EasyPrivacy. Partially solved by adding dynamicFilteringString to allow the request.https://github.com/dhowe/AdNauseam/issues/399
Now AdNauseam can collect ads images from iframes with external src(in this case, iframe[src*="googlesyndication.com"]).
However, for the iframes whose src looks like this:
, I can only see the process message of the iframe, but not any content inside.
Could this be some problems with injecting content script into those iframes?
is this still relevant?
yes, this issue still exists.
can you test whether our content scripts are being (dynamically) injected?
I can see a few inject messages from background.html but I don't know which iframe it is referring to. How can I know the exact element from the frameId?
I'm not exactly sure, as you've guessed those last 2 numbers are tabId / frameId, the URL is for the page itself -- you can also get more info by printing/debugging the pageStore:
Then the content scripts are only (dynamically) injected into those iframes with an external src, not to those with src"javascript:'<html><body></body></html>'".
Is that true? It should be any iframes that don't exist originally on the page, but are created (regardless of source)
http://www.si.com/ ad iframe won't be dynamically created when the outside ad-container is invisible. Therefore, no ads can be found when AdNauseam hide all the ad-container. Once the cosmetic filter is toggled on, ads can be collected.
so we need to not block the outer ad-container, correct?
we can do this either by adding the rule to the disabledRules list, or by creating an exception rule in adnauseam.txt
yes. I think an exception rule for ad-container in si.com would be good, as this is a quite wide selector.
great, make it so..
fixed si.com with https://github.com/dhowe/AdNauseam/pull/433
👍
(list in progress) ENGLISH checklist of ads hidden but not collected
- [ ] si.com
- [ ] cnn.com (2 hidden, 0 collected)
- [ ] http://www.forbes.com/ (0 collected)
- [ ] http://www.sfgate.com/ (only 'today's deals' collected)
- [ ] http://www.theatlantic.com/ (unreliable, collects between 0-4/4)
- [ ] http://nypost.com/
A fresh update of nytimes...At least for today, some of their ad images(Ex:TopLeft) don't have a parent tag. Instead they have this interesting way of writing onclick attribute for their ad images...And when you click it, it doesn't lead you to anywhere...
Not sure if I understand what you are saying: when I click, I end up at the ad site (or do you mean when adn clicks?)
Interesting...if I click the ad without any blocker on, I also end up in the ad site. But when I have AdNauseam on, it goes through a quick process of opening a new page and closing it instantly...So AdNauseam must be blocking something necessary for this to run.
If this is the case...are we going to parse this onClick?It seems like NYTimes is calling their own objects and functions to do this...at least for this ad.
Or can we ignore the target URL for special cases?
Do you mean that you disable cosmetic filters for the page, then view and click the ads?
If so, I notice that adding the single dynamic filter below solves the problem:
nytimes.com serving-sys.com * allow
Note I've also updated the parseOnClick() code to handle this case, see 54ae3fc7
Question: Do we need text-ad parser for Chinese search engines? I can work on that.
yeah, that would be good, at least for the 2 or 3 most popular...
- [x] yahoo ads were hidden but not collected because internal link. Fixed in https://github.com/dhowe/AdNauseam/pull/464
- [ ] http://www.accuweather.com/ there is massive ads all over, which get hidden, but do not show up in parser.js console nor are they collected. Are those the so called dynamically created iframes?
They show up in the parser console for me (and match exactly what I see in the logger):
Did you enable the debugging flag?
And when I disable blocking via a firewall rule (below), I see many ads being collected, which tells me that one of the blocking rules is breaking the ad collection (probably because elements we need are never making it to the page).
pages analysed in https://github.com/dhowe/AdNauseam/issues/427 in which ads were hidden but not collected. working on this next.
- [x] http://www.nytimes.com/
- [x] http://www.nbcnews.com/
- [x] https://www.yahoo.com/news
- [ ] http://www.dailymail.co.uk/ushome/
- [ ] https://www.washingtonpost.com
- [ ] https://www.theguardian.com/us
- [ ] http://www.wsj.com
- [ ] http://abcnews.go.com
- [ ] http://www.bbc.com/news
- [ ] http://www.cbsnews.com
- [ ] http://www.reuters.com
- [ ] http://www.msnbc.com
- [ ] http://www.cbc.ca/news
- [ ] http://www.news.com.au
- [ ] http://www.cnn.com
^ going through above websites. Is there a proven method to definitively say what ads are not collected? What I experience often is that an ad is not collected 5 times in a row and suddenly when refreshing a 6th time it is. Result of that is that I write filters and during testing suddenly notice that it is redundant. I am aware events fire irregularly and things might be dependent on cache or if I toggle cosmetic filtering (which might activate dynamic iFrames to be generated) etc. Wondered if you have similar experiences / a rule of thumb for me.