Adding Firefox use counter data to HTTPArchive
Hey folks,
In https://github.com/HTTPArchive/legacy.httparchive.org/issues/59 Chrome's use counter data was added to httparchive.
The relevant code in this repo seems to be https://github.com/HTTPArchive/data-pipeline/blob/41fe511951797d25cebc71097726c8b65497212b/modules/import_all.py#L146
I'd like to have Firefox use counter data also be available in httparchive. (Maybe more things also, but starting with use counters.) To that end, I've filed https://bugzilla.mozilla.org/show_bug.cgi?id=1813593 so that the data can be extracted locally.
Are there considerations we should know about for this to work?
cc @emilio @janodvarko
We only run tests in Chrome, so I don't think this would be feasible. @pmeenan WDYT?
Why not? We'd need to also run tests in desktop Firefox, in addition to current desktop Chrome and Android Chrome. I don't think it's necessarily useful to collect and store everything for Firefox that is currently stored for Chrome, that would increase storage with 50%. But only use counter data seems negligible for storage.
It’s not just storage. It’s also crawl capacity.
As it stands right now, it takes ~25,000 VM's the better part of a week to collect the data. Technically it is pretty easy to support but financially it would increase the running costs by ~30% (assuming we'd only run one config). I'm guessing some form of additional sponsorship would be needed to cover the costs.
OK, thanks. What would that amount to in USD?
30% of our current crawl expenses would come out to about $20k per month.
That is likely more than the value Mozilla would get from the data. 🙂
For web compat analysis, the sample_data URLs (10k pages) would still be better than nothing. Assuming a full run would be 12,500,000 URLs (httparchive.pages.2023_01_01_desktop has 12,647,566 rows), 10k pages would be 0.08% of the cost, which is .... $16.
Would it be feasible to start there?
Update: https://bugzilla.mozilla.org/show_bug.cgi?id=1813593 is now fixed (thanks @emilio!). It's possible to set these prefs to log use counter data to stderr:
- dom.use_counters.dump.document
- dom.use_counters.dump.worker
- dom.use_counters.dump.page
For the purpose of this issue, using page and worker but not document makes most sense. (document is for each document, including e.g. SVGs; these accumulate into page which is per top-level page.) worker use counters don't accumulate into page so need to be included separately.
The logged output looks like this:
USE_COUNTER_PAGE: USE_COUNTER2_DOCUMENTOPEN_PAGE - http://software.hixie.ch/utilities/js/live-dom-viewer/
USE_COUNTER_PAGE: USE_COUNTER2_CSS_PROPERTY_Display_PAGE - http://software.hixie.ch/utilities/js/live-dom-viewer/
USE_COUNTER_PAGE: USE_COUNTER2_CSS_PROPERTY_FontStyle_PAGE - http://software.hixie.ch/utilities/js/live-dom-viewer/
USE_COUNTER_PAGE: USE_COUNTER2_CSS_PROPERTY_FontWeight_PAGE - http://software.hixie.ch/utilities/js/live-dom-viewer/
You need to close the page for some of the use counters to be added to the log.
No development in the last 2 years. Closing as outdated.