docusaurus
docusaurus copied to clipboard
feat: JSON-LD structured data implementation for blog
Pre-flight checklist
- [x] I have read the Contributing Guidelines on pull requests.
- [x] If this is a code change: I have written unit tests and/or added dogfooding pages to fully verify the new behavior.
- [x] If this is a new API or substantial change: the PR has an accompanying issue (closes #9274) and the maintainers have approved on my working plan.
Motivation
I originally contributed Structured Data support for blog posts back in 2021: https://github.com/facebook/docusaurus/pull/5322
@lex111 subsequently submitted a PR to migrate the approach to use microdata instead: https://github.com/facebook/docusaurus/pull/5355
I had reservations which I voiced at the time, but left it at that. Since then time I've had something of a baptism of fire around the world of SEO. And consequently I've been working with some excellent folk in the SEO industry to improve my own ranking. A thing that comes up repeatedly is a suggestion to use JSON-LD instead of microdata as that is what Google prefers: https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data#supported-formats
In general, Google recommends using JSON-LD for structured data if your site's setup allows it, as it's the easiest solution for website owners to implement and maintain at scale (in other words, less prone to user errors).
I raised #9274 to discuss this and received some good feedback.
I've now implemented JSON-LD support for the blog; both individual posts and the blog listing page. With this change in place, it's now possible to separately configure the Structured Data through swizzling the two new components:
BlogListPage/StructuredDataBlogPostPage/StructuredData
From @Josh-Cena:
Swizzability does seem desirable. I also wonder if there are cases in the wild where people swizzle blog component and inadvertently broke microdata. This sounds reasonable to me.
The default behaviour for these components is to produce JSON-LD structured data that aligns with the Schema.org and Google's Rich Results guidelines.
Let's talk for a moment about each of these components.
BlogListPage/StructuredData
This component is responsible for generating the Structured Data for the blog list page. It renders JSON-LD structured data that aligns with the https://schema.org/Blog schema. (Please note the examples at the bottom of the page which this implementation aligns with.)
BlogPostPage/StructuredData
This component is responsible for generating the Structured Data for the blog post page. It renders JSON-LD structured data that aligns with the https://schema.org/BlogPosting schema. (Please note the examples at the bottom of the page which this implementation aligns with.)
The BlogPosting schema is one of the structured data types that Google explicitly supports for Rich Results: https://developers.google.com/search/docs/appearance/structured-data/article#structured-data-type-definitions
All the Google-supported properties are included in the Structured Data generated by this component apart from dateModified which is optional. A number of other properties documented in the BlogPosting schema are included as well.
Test Plan
I will use the pull request preview on this PR to demonstrate that the Structured Data is generated as expected. I will also use the Structured Data Testing Tools to verify that the Structured Data is valid:
- https://search.google.com/test/rich-results - this tool is used to test for Rich Results; only applicable to the blog post page
- https://validator.schema.org/ - this tool is used to validate the Structured Data; applicable to both the blog list page and the blog post page
~~Expect screenshots to be added to this PR.~~
Test links
Deploy preview: https://deploy-preview-9669--docusaurus-2.netlify.app/
BlogListPage/StructuredData
If we go to the test preview of the /blog page: https://deploy-preview-9669--docusaurus-2.netlify.app/blog
We can validate with schema.org that the Blog structured data is valid: https://validator.schema.org/#url=https%3A%2F%2Fdeploy-preview-9669--docusaurus-2.netlify.app%2Fblog
BlogPostPage/StructuredData
If we go to the test preview of the /blog/releases/2.4/ page: https://deploy-preview-9669--docusaurus-2.netlify.app/blog/releases/2.4/
We can validate with schema.org that the BlogPosting structured data is valid: https://validator.schema.org/#url=https%3A%2F%2Fdeploy-preview-9669--docusaurus-2.netlify.app%2Fblog%2Freleases%2F2.4%2F
And we can also test this type with the Rich Results tool: https://search.google.com/test/rich-results
You can also see this in the Ahrefs Chrome extension: https://chromewebstore.google.com/detail/ahrefs-seo-toolbar-on-pag/hgmoccdbjhknikckedaaebbpdeebhiei?pli=1
Related issues/PRs
#9274
[V2]
Built without sensitive environment variables
| Name | Link |
|---|---|
| Latest commit | e0da5cf0b93050ce8a2f95921478d1f51bf3d75a |
| Latest deploy log | https://app.netlify.com/sites/docusaurus-2/deploys/658a901701f7a80008a486f9 |
| Deploy Preview | https://deploy-preview-9669--docusaurus-2.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
[V2]
| Name | Link |
|---|---|
| Latest commit | 96073e8d1c9b55ea1a7e8ab6195905bde3627ace |
| Latest deploy log | https://app.netlify.com/sites/docusaurus-2/deploys/65ce26f897d19b0008565473 |
| Deploy Preview | https://deploy-preview-9669--docusaurus-2.netlify.app |
| Preview on mobile | Toggle QR Code...Use your smartphone camera to open QR code link. |
To edit notification comments on pull requests, go to your Netlify site configuration.
β‘οΈ Lighthouse report for the deploy preview of this PR
| URL | Performance | Accessibility | Best Practices | SEO | PWA | Report |
|---|---|---|---|---|---|---|
| / | π 66 | π’ 98 | π’ 96 | π’ 100 | π 88 | Report |
| /docs/installation | π’ 90 | π’ 96 | π’ 100 | π’ 100 | π 88 | Report |
| /docs/category/getting-started | π 77 | π’ 100 | π’ 100 | π’ 90 | π 88 | Report |
| /blog | π 71 | π’ 100 | π’ 100 | π’ 90 | π 88 | Report |
| /blog/preparing-your-site-for-docusaurus-v3 | π 66 | π’ 96 | π’ 100 | π’ 100 | π 88 | Report |
| /blog/tags/release | π 70 | π’ 100 | π’ 100 | π 80 | π 88 | Report |
| /blog/tags | π 77 | π’ 100 | π’ 100 | π’ 90 | π 88 | Report |
Hi @Josh-Cena and @slorber!
I was wondering if there were any thoughts about this PR? There's been no comments on it and so I'm not sure if you're aware it is here? I've been checking back every week or so for a while but there appears to be no activity.
It's possible you're not interested in the PR - if so would you be able to let me know and I'll close it for tidiness sake?
Yeah this PR was a Christmas project for me - I think it's a really good piece of work actually! (Of course I'm biased π)
I think it puts the structured data story of Docusaurus in a really great place as it offers a really good default JSON-LD structured data story and freedom for users to straightforwardly control the structured data produced through the magic of swizzling. (In fact if they wanted to they could easily use the same mechanism to stop producing structured data)
If you've tested it yourself and it works, I'm personally happy to try it out and improve it where necessary
I have indeed and I'm happy to take feedback to improve it as necessary.
Thanks for the review @slorber - useful points, will address them soon!
If we merge this, should this be considered as a breaking change? π€·ββοΈ
No - I can't think of any reason why it would be
Some additional changes requested and a few questions
Cool - I've addressed these. See my responses above!
Okay - all yours @slorber!
@johnnyreilly here's a Docusaurus playground: https://stackblitz.com/edit/github-6jxhz6?file=src%2Fpages%2Findex.tsx
import useDocusaurusContext from '@docusaurus/useDocusaurusContext';
import Head from '@docusaurus/Head';
export default function Home(): JSX.Element {
return (
<>
<h1>Home</h1>
<Head>
<script type="application/ld+json" data-seb-id="seb-id-1">
{JSON.stringify({
'@context': 'https://schema.org/',
'@type': 'Organization',
name: 'Meta Open Source',
url: 'https://opensource.fb.com/',
logo: 'https://opensource.fb.com/img/logos/Meta-Open-Source.svg',
})}
</script>
</Head>
<Head>
<script
data-seb-id="seb-id-2"
type="application/ld+json"
dangerouslySetInnerHTML={{
__html: JSON.stringify({
'@context': 'https://schema.org/',
'@type': 'Organization',
name: 'Meta Open Source',
url: 'https://opensource.fb.com/',
logo: 'https://opensource.fb.com/img/logos/Meta-Open-Source.svg',
}),
}}
/>
</Head>
<script type="application/ld+json" data-seb-id="seb-id-3">
{JSON.stringify({
'@context': 'https://schema.org/',
'@type': 'Organization',
name: 'Meta Open Source',
url: 'https://opensource.fb.com/',
logo: 'https://opensource.fb.com/img/logos/Meta-Open-Source.svg',
})}
</script>
<script
data-seb-id="seb-id-4"
type="application/ld+json"
dangerouslySetInnerHTML={{
__html: JSON.stringify({
'@context': 'https://schema.org/',
'@type': 'Organization',
name: 'Meta Open Source',
url: 'https://opensource.fb.com/',
logo: 'https://opensource.fb.com/img/logos/Meta-Open-Source.svg',
}),
}}
/>
</>
);
}
The static html output is:
<!doctype html>
<html lang="en" dir="ltr" data-has-hydrated="false">
<head>
<script data-rh="true" type="application/ld+json" data-seb-id="seb-id-1">
{
"@context": "https://schema.org/",
"@type": "Organization",
"name": "Meta Open Source",
"url": "https://opensource.fb.com/",
"logo": "https://opensource.fb.com/img/logos/Meta-Open-Source.svg"
}
</script>
</head>
<body class="navigation-with-keyboard">
<div id="__docusaurus">
<h1>Home</h1>
<script type="application/ld+json" data-seb-id="seb-id-3">
{"@context":"https://schema.org/","@type":"Organization","name":"Meta Open Source","url":"https://opensource.fb.com/","logo":"https://opensource.fb.com/img/logos/Meta-Open-Source.svg"}
</script>
<script data-seb-id="seb-id-4" type="application/ld+json">
{
"@context": "https://schema.org/",
"@type": "Organization",
"name": "Meta Open Source",
"url": "https://opensource.fb.com/",
"logo": "https://opensource.fb.com/img/logos/Meta-Open-Source.svg"
}
</script>
</div>
</body>
</html>
You will notice that:
- case 1 is perfectly fine despite not using
dangerouslySetInnerHTML - case 2 doesn't even render with Helmet
- case 3 is what we want to avoid
- case 4 is ok but not in
<head>
So I'd prefer to recommend users always render structured data inside <head> or <Head> (case 1) and remove the StructuredData helper.
What do you think?
remove the
StructuredDatahelper.
For the SEO docs example I don't really mind. It is documentation and if it seems to work as is then I'd say feel free to revert my docs changes and leave as they are. Maybe it's fine. As you say, it looks okay. I'll confess to low level anxiety but it's probably okay π It's a thing that people may experiment with if they are interested in going low level. To be clear, this doesn't worry me so much because it would be opt-in.
If you're suggesting removing the StructuredData helper from this PR or changing the behaviour around dangerouslySetInnerHTML I would be very concerned! As I mentioned above, the Google Search Console was worryingly inconsistent with it's parsing. Sometimes it would parse successfully when using just inlining JSON.stringify and sometimes it would not. I was never able to reliably identify why. My speculation was that " was the cause - I cannot say with certainty.
Using dangerouslySetInnerHTML reliably fixed these issues. I would be worried if this PR was merged not using the dangerouslySetInnerHTML approach. It may damage SEO for all Docusaurus users. It wouldn't be opt-in.
And as someone who has been on the rough end of SEO going wrong, I very much don't wish to inflict it on others!
@johnnyreilly
My speculation was that " was the cause - I cannot say with certainty.
It's hard to justify adding an abstraction for a problem we don't understand π
Introducing this abstraction also introduces shortcomings, as you can see you can't use that in Helmet to render the structured data in head, which complicates usage inside markdown files and users may try this in JSX too.
Using dangerouslySetInnerHTML reliably fixed these issues.
If the output is exactly the same in terms of HTML, how could this go wrong?
The only difference is in one case the html will be in the <body> while in the other case it will be in <head> (which seems better to me)
We can test this on the Docusaurus website and see after a few weeks if our SEO decreases?
It's hard to justify adding an abstraction for a problem we don't understand π
The justification is the experience of discovering that Google Search Console doesn't behave as expected. And there's no source code to look at or anyone to advise alas - it's just a service that's a black box to the users.
I agree it's frustrating, but that doesn't change the outcome!
you can't use that helmer to render the structured data in head
I agree that is a problem.
We can test this on the Docusaurus website and see after a few weeks if our SEO decreases?
Maybe that's a good idea. Do you have access to Google Search Console for the Docusaurus site? That's where the issue showed up for me - and likely would if this turns out to be problematic. If the issue doesn't show up then maybe rendering inline JSON in the <head> is okay.
Incidentally I find it really interesting that helmet disables the " behaviour. Do you have any idea why?
I have access to the google search console, but don't use it much so I'm not sure how to detect this issue in practice. Can you explain how you saw the problem?
Incidentally I find it really interesting that helmet disables the " behaviour. Do you have any idea why?
I'm not too surprised, Helmet children are collected at SSG time and rendered into the head through a different part of the code, not using React at all.
Recently it's even more important for head metadata to support SSR streaming. We don't really have that problem for Docusaurus/SSG (so we can keep using Helmet even though it doesn't support streaming), but afaik React 19 core will introduce core metadata primitives, and Next.js has a dedicated metadata api too.
In general I'm not sure using React to append JSON or JavaScript to the page is a good idea. We can see in many places that people don't use React to serialize a Redux store to the page for example (see Redux SSR docs)
Considering we are not 100% sure of the impact of this change, I would follow that plan:
- remove
StructuredDatahelper - always render structured data in
<head>and recommend this practice in docs examples - merge and plan to backport in 3.x, but not for 3.2
- see the impact on SEO after 1 month
- if no SEO impact, backport it for 3.3+
- if SEO impacted negatively, find another solution that restores SEO)
It's interesting to see Google Search Console displayed in French! (I would have assumed they had internationalised it; but I've never seen in action. I have a past life working on internationalisation in web apps so I always find it quite interesting to see it in action!)
I have access to the google search console, but don't use it much so I'm not sure how to detect this issue in practice. Can you explain how you saw the problem?
If memory serves, an "Unparsable structured data" menu item can appear in the "Enhancements" section when there's an issue:
Alternatively you may get an "invalid" report in one of the "Enhancements" screens:
With an entry underneath you can click through to to see details of the issue. (Regrettably now mine are fixed I can't see them anymore to show you.)
Considering we are not 100% sure of the impact of this change, I would follow that plan:
- remove
StructuredDatahelper- always render structured data in
<head>and recommend this practice in docs examples- ...
From all the reading I've done (and based on personal experience), it seems that using JSON-LD is supported equally in head and body:
- https://developers.google.com/search/docs/appearance/structured-data/intro-structured-data#supported-formats
- https://www.w3.org/TR/json-ld11/#example-145-combining-multiple-json-ld-script-elements-into-a-single-dataset
That said, if you'd like to steer Docusaurus usage towards preferring usage of head then I see no disadvantage to doing so.
- ...
- merge and plan to backport in 3.x, but not for 3.2
- see the impact on SEO after 1 month
- if no SEO impact, backport it for 3.3+
- if SEO impacted negatively, find another solution that restores SEO)
All in all, it sounds like a good plan - go for it!
Thanks, let's implement that plan then π
Do you want to make the changes to the current PR or I'll do it?
(I applied my little refactor already)
I'll monitor that console more closely
Note we already have one problem reported here: Champ "position" manquant (dans "itemListElement")
(missing field "position" in "itemListElement")
That doesn't seem to be related to this PR so we can attempt to fix this in another place. Surprising we have this issue and you don't, maybe it's because you fixed it?
That's breadcrumbs in English I think? If memory serves, there's no breadcrumbs in Docusaurus for blogs. In fact I've written about how to swizzle and add them here: https://johnnyreilly.com/docusaurus-blogs-adding-breadcrumb-structured-data
I think the issue being surfaced there is likely from Docusaurus docs which have breadcrumb support: https://github.com/facebook/docusaurus/pull/6697
My site wouldn't be affected by this issue as I don't use the docs portion of Docusaurus in my site - the breadcrumb structured data you see in my reports is from my swizzling.
Yes "fil d'ariane" is "breadcrumb", I'll try to fix it separately.
Do you want to do the cleanup we agreed on on this PR before merging, or I'll do it?
Do you want to do the cleanup we agreed on on this PR before merging, or I'll do it?
Can you do it please? My machine is in the middle of a rebuild and I'll be out of action for a while (incidentally, automating my machine repaves clearly remains on my "todo" list π )
Yes "fil d'ariane" is "breadcrumb", I'll try to fix it separately.
I was looking around last night for an issue related to breadcrumbs on Docusaurus which may be related; I'm pretty sure there is one but for some reason it eludes me.
LGTM thanks π
Noted to check SEO Google Search Console in 2 weeks + 1 month
We'll eventually add back the to backport label on the PR if we see no negative impact.
Hey @slorber,
It's been two weeks - just wanted to check in and see how the "amΓ©lioration" section of Docusaurus Google Search Console is looking? Does it look okay?
So far it doesn't seem to affect SEO much.
But I'll keep monitoring this for a few more weeks to be sure. Impressions (purple) have slightly decreased, but it could be seasonality, Google algorithm changes, or something else π€·ββοΈ
Surprisingly the number of clicks (blue) remains as high as before so maybe the search just became more relevant?
Do you observe similar behavior on your site?
We still have the same breadcrumbs suggestions being reported:
But I'll keep monitoring this for a few more weeks to be sure. Impressions (purple) have slightly decreased, but it could be seasonality, Google algorithm changes, or something else π€·ββοΈ
Surprisingly the number of clicks (blue) remains as high as before so maybe the search just became more relevant?
I suspect this is just slight variability - essentially SEO unaffected. If things change massively then it's a concern; slight variance then it's likely just fine. (SEO will always vary slightly over time and that's out of our control in the main and nothing to worry about)
We still have the same breadcrumbs suggestions being reported:
have you done anything to remedy this? I didn't spot a PR but I might have missed.
TL;DR - so far it sounds fine
Agree π I still want to work on a few things for v3.2 so maybe we'll include this PR in v3.2 in a few weeks.
have you done anything to remedy this? I didn't spot a PR but I might have missed.
Not a high priority for me to investigate atm, I'll get back to it later so if you know how to fix the problem go ahead.
We have this being reported for docs and blog posts too. Not sure why I can't get this UI in English easily π
Missing "position" field (in "itemListElement") Items with this problem are invalid. Invalid items cannot appear in the enhanced Google search results.
Not a high priority for me to investigate atm, I'll get back to it later so if you know how to fix the problem go ahead.
I'm pretty snowed right now, but I might see if I can take a look in a couple of weeks when things quiet down (I hope)
Today's SEO results:
Still less impressions, but more clicks, so maybe it's just Google targeting better the impressions π€·ββοΈ
Anyway, this doesn't destroy our SEO so it looks relatively safe to release.
Looks good - ship it!
Great work on this @johnnyreilly. Is the change live? I donβt see it in the changelog.
I think it went live with 3.2
https://github.com/facebook/docusaurus/releases/tag/v3.2.0