bigquery icon indicating copy to clipboard operation
bigquery copied to clipboard

BigQuery import and processing pipelines

Results 16 bigquery issues
Sort by recently updated
recently updated
newest added

To finish https://github.com/HTTPArchive/httparchive.org/pull/110 we should rename [ttci.sql](https://github.com/HTTPArchive/bigquery/blob/0691ea5ef318f0589172f3dcb8ed612be3f3fe97/sql/histograms/ttci.sql) and [ttfi.sql](https://github.com/HTTPArchive/bigquery/blob/0691ea5ef318f0589172f3dcb8ed612be3f3fe97/sql/histograms/ttfi.sql) or for some backward compatible reasons create new tti.sql and firstCPUIdle.sql

Since we have this column can we populate it with the new [CrUX ranking](https://developers.google.com/web/updates/2021/03/crux-rank-magnitude)? It's confusing not to have it in here, makes joins more difficult, and means you need...

We have a [HTTP/2 requests graph](https://httparchive.org/reports/state-of-the-web#h2) which does a look up on the `$_protocol` field in the `requests.payload` column. This currently costs **211TB** and costs an estimated **$1,058** (yes -...

The latest lighthouse.2018_10_15 table is 237 GB. Querying all lighthouse tables currently costs 4.15 TB and runs in several minutes. ![image](https://user-images.githubusercontent.com/1120896/48222217-0bdaf800-e362-11e8-930a-7be2b0aac1ae.png) 1. identify parts of the JSON payload that are...

enhancement
Good first bug

The new experimental UI of BigQuery doesn't seem to allow adding external sources, unless they're part of either the organisation or the catalogue of [public datasets](https://cloud.google.com/bigquery/public-data). While, for now, it's...

Per https://github.com/GoogleChrome/lighthouse/pull/10716#issuecomment-648520092 we should prune the `full-page-screenshot` and `final-screenshot` audits.

Forked from #76 Currently we use scheduled queries to scan each dataset/client combo for the latest release and save that to its respective `latest._` table. For example, here's the scheduled...

enhancement

Suggested in the [HTTP Archive Slack channel](http://bit.ly/http-archive-slack): > Was wondering if it makes sense to add a "sample" dataset that contains data for the first ~1000 pages. This way you...

enhancement

The runs.request tables include a `firstHtml` field to indicate that the request is for the parent document. Queries on the har.request tables must join on the corresponding runs table to...

Good first bug

Lots of pages reporting 0bytes since April:

bug