httparchive.org
httparchive.org copied to clipboard
Missing font_details
In the March 2021 requests table, about 7% of fonts are missing the _font_details property.
SELECT
COUNTIF(JSON_EXTRACT(payload, '$._font_details') IS NULL) AS null_font_details,
COUNT(0) AS all_fonts,
COUNTIF(JSON_EXTRACT(payload, '$._font_details') IS NULL) / COUNT(0) AS pct_null_font_details
FROM
`httparchive.requests.2021_03_01_desktop`
WHERE
JSON_EXTRACT_SCALAR(payload, '$._request_type') = 'Font'
| null_font_details | all_fonts | pct_null_font_details |
|---|---|---|
| 1,978,495 | 27,740,971 | 7.13% |
The expected behavior is for all fonts to have a _font_details property.
cc @rsheeter
For comparison, in August 2020 13% of font requests were missing _font_details, so this isn't necessarily a new issue and it may even be getting better.
The case that seems particularly odd is where status is 200 but there are no font details. I tried a few urls (for Google Fonts, fonts.gstatic.com) that were status 200 with no font details and they worked and seemed to load into fonttools just fine.
select
status,
countif(has_font_details) has_font,
countif(not has_font_details) no_font,
countif(has_font_details) / count(0) pct_has_font
from (
select
JSON_EXTRACT_SCALAR(payload, '$.response.status') status,
JSON_EXTRACT(payload, '$._font_details') is not null has_font_details
from
`httparchive.requests.2021_03_01_desktop`
where
JSON_EXTRACT_SCALAR(payload, '$._request_type') = 'Font'
) t
group by status
;
claims that 93.5% of fonts with 200 responses have font details.
If I add and net.host(url) = 'fonts.gstatic.com' to try to filter to Google Fonts then I get 98.6% of responses with 200 codes have font details.
cc @drott
Yeah, to set expectations, it will never be 100%. The font details can only be pulled from requests that WPT managed to get the font bodies from (directly out of Chrome) and for URLs that it knows about without having to look at the netlog. For that second grouping, that would include any fonts that are pushed with HTTP/2 push but never used by the page (I assume infrequent but could be surprised).
If you have a few page URLs where it was expected to have the font details but where they weren't available it would also help.
For clarification, I don't think the issue is with processing the font files themselves with fonttools but rather that we don't have the raw font file at the time of the analysis.
If it's normal to occasionally successfully read metadata, observe a 200 status in the metadata, and yet not be able to get the font file that could explain what we're seeing. Out of curiosity, why do we find ourselves in that situation? Hm, also could we mark those records so we can readily filter them out?