website
website copied to clipboard
Should we ignore old spec versions with robot.txt ?
Reason/Context
This is a Google Search Console summary for November:

So https://www.asyncapi.com/docs/specifications/v2.0.0 is top-performing which is not the best, it should be either https://www.asyncapi.com/docs/specifications/latest or https://www.asyncapi.com/docs/specifications/v2.2.0
This is what I get not even in Google Search but Brave Search:

Description
One possible solution could be that we add below to robot.txt (actually add this file too):
Disallow: /docs/specifications/v2.0.0
Disallow: /docs/specifications/v2.1.0
The problem I see is that then if someone queries for asyncapi specification 2.0 they will not find this page. Now, is it really a problem? they can just access latest and then navigate to older? right?
For sure, current issue is a real issue as not all users will notice they ended up reading old stuff.
The alternative would be a huge banner on 2.0 and 2.1 and other older versions saying this is old, we suggest you use the latest
Thoughts?
I definitely agree with you that we need to attempt to crawl the latest version versus the others. The disallow rules that you have suggested above are perfect for achieving this, ESPECIALLY because it looks like there is duplicate content across all of the versions (there are probably only subtle differences, but Google can still tell that it is duplicated if there are enough similarities).
In addition to this, we can use canonical urls to help Google know which page is the main page (or most recent version) and this way we can keep the other pages indexed but its our way of telling Google "hey, these pages are duplicate content, please only look at the main one" (Here is the docs page from GSC for reference)
For example: On the following pages https://www.asyncapi.com/docs/specifications/v2.1.0 https://www.asyncapi.com/docs/specifications/v2.0.0
We will add this tag in the <head>:
<link rel="canonical" href="https://www.asyncapi.com/docs/specifications/v2.2.0" />
And over time, Google will know to only crawl the canonical url, which will be the latest version of the spec. I wouldn't use: https://www.asyncapi.com/docs/specifications/latest as it seems to be a redirect. It would be best practice to use the main URL.
*Disclaimer: I am not an SEO expert but these are just things that I learned as a developer at an agency working with an SEO team
And over time, Google will know to only crawl the canonical url, which will be the latest version of the spec. I wouldn't use: https://www.asyncapi.com/docs/specifications/latest as it seems to be a redirect. It would be best practice to use the main URL.
That's right but wrong at the same time. Let me explain. Theoretically, Google won't stop crawling those URLs but will decrease their periodicity. But we don't exactly know, because in fact, Google could consider them equal content or rather different (because they are actually different) and skip those tags.
I think we should spend some time understanding what is currently happening on those paths via the Google Search Console (only the owners of the domain can do unfortunately). That tool shows you information about what URL google decided as canonical, etc etc.
@smoya yeah I guess Google's alg in general is a toss up these days 😆 I wonder too if it would consider those pages duplicate content or if it knows to detect that there are minor differences? Maybe yeah due to this we should analyze the GSC before we do anything, agree with you there!
but isn't robot.txt enough? I guess this is still a standard for all bots, where you can tell the bot exactly what pages to ignore. I used it in my previous project -> https://kyma-project.io/robots.txt. It must work as they restructured their docs but forgot to update robots.txt and their docs are not indexed properly.
@derberg But you don't want to stop those pages (older spec versions) from being indexed.
well this is my main question in the issue title 😄 so I think, we should ignore them, but I'm probably missing some downsides 🤔
Let's say I'm a developer of a project that uses AsyncAPI 2.1.0. If I need to check the documentation, I want to quickly get the one for 2.1.0. I will type into google AsyncAPI spec 2.1.0 documentation and I expect a result to drive me to it, not to another version.
the thing is that content is almost the same, so when you google AsyncAPI spec 2.1.0 you get 2.0.0. If you mark 2.2 as canonical, then when you google AsyncAPI spec 2.1.0 you will get 2.2.0.
maybe we should experiment with meta tags?
for sure improve description so it is unique per version
<meta name="description" content="AsyncAPI Specification
Disclaimer
Part of this content has been taken from the great work done by the folks at the OpenAPI Initiative. Mainly because it&amp#39;s a great work and we want to keep as mu">
and then also add keywords?
the thing is that content is almost the same, so when you google
AsyncAPI spec 2.1.0you get2.0.0. If you mark 2.2 as canonical, then when you googleAsyncAPI spec 2.1.0you will get2.2.0.
This is exactly the concern I mentioned few comments above: https://github.com/asyncapi/website/issues/506#issuecomment-999494551
maybe we should experiment with meta tags?
I think we could try it, but I would say in combination of what @mcturco suggested.
and then also add keywords?
You mean on the title? Around the copy? Not sure about the usage of keywords nowadays TBH.
This is exactly the concern I mentioned few comments above: #506 (comment)
oh, sorry, now I got it
You mean on the title? Around the copy? Not sure about the usage of keywords nowadays TBH.
yeah, just add version number to keyword, that is it, we have no keywords now, so 🤷🏼
@smoya @mcturco ok, folks, we had a neat discussion here, thanks! let's just try different solutions one by one and monitor with the next reports of Search Console, and let's see 👀
- I suggest we first try with metadata as we anyway need to fix
descriptionas they are not cool cross spec versions - then we try with canonical link, this one I think is more tricky to add. I mean more complex, as it is not as simple as manually changing just HTML files
- if above change nothing we go back to topic of
robot.txtand just blocking old versions from indexing
Thoughts?
@derberg sounds good to me!
This issue has been automatically marked as stale because it has not had recent activity :sleeping:
It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.
There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.
Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.
Thank you for your patience :heart:
This issue has been automatically marked as stale because it has not had recent activity :sleeping:
It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.
There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.
Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.
Thank you for your patience :heart:
This issue has been automatically marked as stale because it has not had recent activity :sleeping:
It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.
There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.
Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.
Thank you for your patience :heart:
still relevant
@smoya @derberg currently we have 3.x version so what should we do with this issue??
This got solved organically then 😆. But the reality is that will probably hit us again in the future when v4 gets released.
closing this for now We will open an issue if this hits again