almanac.httparchive.org
almanac.httparchive.org copied to clipboard
Privacy 2024 queries
Queries
Bounce tracking:
- [x] number_of_websites_with_bounce_tracking.sql
CNAME
- [x] most_common_cname_domains.sql
IAB consent frameworks:
- [x] most_common_countries_for_iab_tcf_v2.sql
- [x] most_common_referrer_policy.sql
- [x] most_common_strings_for_iab_usp.sql
- [x] number_of_websites_with_iab.sql
GPC prevalence:
- [x] number_of_websites_with_gpc.sql
CMPs presence
- [x] most_common_cmps_for_iab_tcf_v2.sql
ads.txt & sellers.json:
- [x] ads_and_sellers_graph.sql
- [x] ads_lines_amount.sql
- [x] ads_seller_accounts_by_type.sql
- [x] common_ads_variables.sql
- [x] top_direct_sellers.sql
Privacy Sandbox:
- [x] number_of_websites_with_related_origin_trials.sql
- [x] privacy-sandbox-adoption-by-third-parties-by-publishers.sql
- [x] number_of_privacy_sandbox_attested_domains.sql
- [x] number_of_ara_destinations_registered_by_third_parties_and_publishers.sql
- [x] top_ara_destinations_registered_by_most_publishers.sql
- [x] top_ara_destinations_registered_by_most_third_parties.sql
CCPA:
- [x] ccpa_most_common_phrases.sql
- [x] ccpa_prevalence.sql
Fingerprinting:
- [x] fingerprinting_most_common_apis.sql
- [x] fingerprinting_most_common_scripts.sql
- [x] fingerprinting_script_count.sql
Cookies:
- [x] cookies_top_first_party.sql
- [x] cookies_top_third_party.sql
Other:
- [x] number_of_websites_with_dnt.sql
- [x] most_common_client_hints.sql
- [x] number_of_websites_per_tracking_technology.sql
- [x] number_of_websites_with_client_hints.sql
- [x] number_of_websites_with_privacy_service.sql
- [x] number_of_websites_with_referrerpolicy.sql
- [x] number_of_websites_with_related_origin_trials.sql
- [x] number_of_websites_with_whotracksme_trackers.sql
- [x] easylist_tracker_detection.sql
Functions
- [x] httparchive.fn.DECODE_ORIGIN_TRIAL
- [x] httparchive.fn.PARSE_ORIGIN_TRIAL
Scripts
- [x] ads_parser.py - Parse and evaluate Google's ads.txt that weights >=100 MB
- [x] populate_easylist_adserver.py
- [x] whotracksme_trackers.py updated
@max-ostapenko
- Yeah, this probably should’ve had a limit on it. For visualization, though, we can just take the top 5-10 from each grouping to display.
- Do you have any ideas for figuring out which URL that happens on? Not sure how that could happen unless the custom_metrics object is malformed, but I could just add a try/catch to ignore those cases.
- Do you have any ideas for figuring out which URL that happens on? Not sure how that could happen unless the custom_metrics object is malformed, but I could just add a try/catch to ignore those cases.
Return an error description within catch
in the UDF. You'll be able to see the scale of the issue in this query results.
And also debug individual URLs by filtering error description strings:
SELECT client, fingerprinting_type, page
FROM pages
WHERE fingerprinting_type LIKE '%Error%'
@hadiamjad could you please add a query you used to create Disconnect reports. Did you update easylist-tracker-detection.sql for this?
@max-ostapenko RE your review request, I don't have time to review all the queries; is there something specific you want me to look at?
@bstandaert-wustl sure, please take a look at bounce tracking, CNAME and something from Privacy Sandbox.