bigquery
bigquery copied to clipboard
BigQuery import and processing pipelines
To finish https://github.com/HTTPArchive/httparchive.org/pull/110 we should rename [ttci.sql](https://github.com/HTTPArchive/bigquery/blob/0691ea5ef318f0589172f3dcb8ed612be3f3fe97/sql/histograms/ttci.sql) and [ttfi.sql](https://github.com/HTTPArchive/bigquery/blob/0691ea5ef318f0589172f3dcb8ed612be3f3fe97/sql/histograms/ttfi.sql) or for some backward compatible reasons create new tti.sql and firstCPUIdle.sql
Since we have this column can we populate it with the new [CrUX ranking](https://developers.google.com/web/updates/2021/03/crux-rank-magnitude)? It's confusing not to have it in here, makes joins more difficult, and means you need...
We have a [HTTP/2 requests graph](https://httparchive.org/reports/state-of-the-web#h2) which does a look up on the `$_protocol` field in the `requests.payload` column. This currently costs **211TB** and costs an estimated **$1,058** (yes -...
The latest lighthouse.2018_10_15 table is 237 GB. Querying all lighthouse tables currently costs 4.15 TB and runs in several minutes.  1. identify parts of the JSON payload that are...
The new experimental UI of BigQuery doesn't seem to allow adding external sources, unless they're part of either the organisation or the catalogue of [public datasets](https://cloud.google.com/bigquery/public-data). While, for now, it's...
Per https://github.com/GoogleChrome/lighthouse/pull/10716#issuecomment-648520092 we should prune the `full-page-screenshot` and `final-screenshot` audits.
Forked from #76 Currently we use scheduled queries to scan each dataset/client combo for the latest release and save that to its respective `latest._` table. For example, here's the scheduled...
Suggested in the [HTTP Archive Slack channel](http://bit.ly/http-archive-slack): > Was wondering if it makes sense to add a "sample" dataset that contains data for the first ~1000 pages. This way you...
The runs.request tables include a `firstHtml` field to indicate that the request is for the parent document. Queries on the har.request tables must join on the corresponding runs table to...