bigquery issues

Create tti.sql and firstCPUIdle.sql or rename ttci.sql and ttfi.sql

1

To finish https://github.com/HTTPArchive/httparchive.org/pull/110 we should rename [ttci.sql](https://github.com/HTTPArchive/bigquery/blob/0691ea5ef318f0589172f3dcb8ed612be3f3fe97/sql/histograms/ttci.sql) and [ttfi.sql](https://github.com/HTTPArchive/bigquery/blob/0691ea5ef318f0589172f3dcb8ed612be3f3fe97/sql/histograms/ttfi.sql) or for some backward compatible reasons create new tti.sql and firstCPUIdle.sql

denar90

blink_features.usage has null rank column

3

Since we have this column can we populate it with the new [CrUX ranking](https://developers.google.com/web/updates/2021/03/crux-rank-magnitude)? It's confusing not to have it in here, makes joins more difficult, and means you need...

tunetheweb

Making the HTTP2 query cheaper

4

We have a [HTTP/2 requests graph](https://httparchive.org/reports/state-of-the-web#h2) which does a look up on the `$_protocol` field in the `requests.payload` column. This currently costs **211TB** and costs an estimated **$1,058** (yes -...

tunetheweb

Reduce size of Lighthouse payload

2

The latest lighthouse.2018_10_15 table is 237 GB. Querying all lighthouse tables currently costs 4.15 TB and runs in several minutes. ![image](https://user-images.githubusercontent.com/1120896/48222217-0bdaf800-e362-11e8-930a-7be2b0aac1ae.png) 1. identify parts of the JSON payload that are...

rviscomi

enhancement

Good first bug

Add HTTP Archive to the public datasets in BigQuery

The new experimental UI of BigQuery doesn't seem to allow adding external sources, unless they're part of either the organisation or the catalogue of [public datasets](https://cloud.google.com/bigquery/public-data). While, for now, it's...

RReverser

Prune Lighthouse audits from LHR

Per https://github.com/GoogleChrome/lighthouse/pull/10716#issuecomment-648520092 we should prune the `full-page-screenshot` and `final-screenshot` audits.

rviscomi

Update the "latest" tables from Dataflow

Forked from #76 Currently we use scheduled queries to scan each dataset/client combo for the latest release and save that to its respective `latest._` table. For example, here's the scheduled...

rviscomi

enhancement

Create and maintain a 10k-row subset table

10

Suggested in the [HTTP Archive Slack channel](http://bit.ly/http-archive-slack): > Was wondering if it makes sense to add a "sample" dataset that contains data for the first ~1000 pages. This way you...

rviscomi

enhancement

Add field comparable to firstHtml to the har.request tables

2

The runs.request tables include a `firstHtml` field to indicate that the request is for the parent document. Queries on the har.request tables must join on the corresponding runs table to...

rviscomi

Good first bug

Total page size status: 0 bytes

2

Lots of pages reporting 0bytes since April:

ebidel

bug

bigquery
bigquery copied to clipboard

Metadata

Create tti.sql and firstCPUIdle.sql or rename ttci.sql and ttfi.sql

blink_features.usage has null rank column

Making the HTTP2 query cheaper

Reduce size of Lighthouse payload

Add HTTP Archive to the public datasets in BigQuery

Prune Lighthouse audits from LHR

Update the "latest" tables from Dataflow

Create and maintain a 10k-row subset table

Add field comparable to firstHtml to the har.request tables

Total page size status: 0 bytes

← Metadata

Owner

Metadata

bigquery bigquery copied to clipboard

Metadata

← Metadata

Owner

Metadata

bigquery
bigquery copied to clipboard