browsertrix
browsertrix copied to clipboard
[Bug]: No ads in replay on some sites eventhough the ads are shown in the brave profile or online
Browsertrix Version
v1.9.4-08ee857
What did you expect to happen? What happened instead?
After the last opgrade to 1.9.4 the ads are not shown any more in replay for tv2.dk even though they are visible in the browserprofile:
here the replay af tv2.dk:
and here the the browser profile : tv2.dk med accept af cookies :
The same happens for berlingske.dk but here it is not possble to see the adds in the browser profile too - eventhough i have disabled all shields in the brave settings.
Here a snip of adds from berlingske.dk online:
and here the browser profile:
and here the brave:setttings for shields:
I can see all ads in brave with disabled shields here:
but strangely enough - it still works for politiken.dk here in replay:
Reproduction instructions
see above
Screenshots / Video
No response
Environment
No response
Additional details
No response
I checked this morning again and only replay of berlingske.dk can't show the ads. tv2.dk and politiken.dk are replaying some of the ads. Any hints to what could be wrong with the setup of berlingske.dk concerning ads?
Hi @tuehlarsen , in 1.9.4 we changed the default crawler version to the latest 1.0.0 beta, that may be responsible for the change. Could you try that crawl again with the "Previous" crawler channel (which is set to 0.12.4) to see if that works? You can find the crawler channel selector in Edit Workflow under Browser Settings:
Here's the relevant section in the docs: https://docs.browsertrix.cloud/user-guide/workflow-setup/#crawler-release-channel
I tried with the previous crawler version with berlingske.dk - it just ignores the browser profile totally and the accept of cookies. With the default crawler it crashes again and again with interrupt: 139.
I tried with the previous crawler version with berlingske.dk - it just ignores the browser profile totally and the accept of cookies. With the default crawler it crashes again and again with interrupt: 139.
The crash in this case was due to sitemap parsing - we have a fix for this shortly, webrecorder/browsertrix-crawler#496 - in the meantime, disable 'Use Sitemap' for this crawl and try agian.
now it runs but berlingske.dk with no ads or no ads traces in replay - i saw the ads during the crawl and no cookies accept popup, so it should use the browser profile. Allmost the same with ekstrabladet.dk In replay: there is a few adds in the midle columnpart of the frontpage and only empty black columns in columns to the left and right. The crawler saw all the ads to the left and right and in the midle column, but allmost no ads are shown in replay.
here online snips:
Here some snips from the crawl:
And a snip from replay:
I can see all adds in a brave browser from a danish ip without shields activated, so perhaps a browsertrix replay issue?
The different newssites use some different ads providers/frameworks e.g. with display of iframes with html etc. information.dk does not use google ads but https://www.adnami.io and shows no ads in replay, only empty spaces, while tv2.dk uses a mix of google ads and https://betterbannerscloud.com. berlingske.dk also uses a mix of google ads and https://www.adnami.io/ but uses the google framework in a different way than replay can handle. https://jyllands-posten.dk/ uses a mix https://www.adnami.io/ and google ads. The best ads replay appatizers are frontpage crawls of politiken.dk and tv2.dk eventhough some ads are missing and we are also running from not danish ip's. It seems to be a hard work to support these ads frameworks but i think it is important to have the most dominant supported in the replay of a newsites "look and feel" because they interact/overrun the news contents so massively.
Re berlingske.dk : When i use the archive.Webpage desktop version from oct. 2023 [ArchiveWeb.page-0.11.3.exe] i can replay traces of the ads and play the videos in the audio/video list : https://beta.browsertrix.cloud/orgs/kb/items/upload/upload-55d89b6d-7561-43e1-a392-76c9ecd89a4f#replay
progress: in version 1.9.7 information.dk shows danish ads or traces in offline replay webpage desktop - in stead of empty placeholders!
see