website Should we ignore old spec versions with robot.txt ?

Reason/Context

This is a Google Search Console summary for November:

Screenshot 2021-12-13 at 11 34 17

So https://www.asyncapi.com/docs/specifications/v2.0.0 is top-performing which is not the best, it should be either https://www.asyncapi.com/docs/specifications/latest or https://www.asyncapi.com/docs/specifications/v2.2.0

This is what I get not even in Google Search but Brave Search: Screenshot 2021-12-13 at 11 36 30

Description

One possible solution could be that we add below to robot.txt (actually add this file too):

Disallow: /docs/specifications/v2.0.0
Disallow: /docs/specifications/v2.1.0

The problem I see is that then if someone queries for asyncapi specification 2.0 they will not find this page. Now, is it really a problem? they can just access latest and then navigate to older? right?

For sure, current issue is a real issue as not all users will notice they ended up reading old stuff.

The alternative would be a huge banner on 2.0 and 2.1 and other older versions saying this is old, we suggest you use the latest

Thoughts?

Dec 13 '21 10:12 derberg

I definitely agree with you that we need to attempt to crawl the latest version versus the others. The disallow rules that you have suggested above are perfect for achieving this, ESPECIALLY because it looks like there is duplicate content across all of the versions (there are probably only subtle differences, but Google can still tell that it is duplicated if there are enough similarities).

In addition to this, we can use canonical urls to help Google know which page is the main page (or most recent version) and this way we can keep the other pages indexed but its our way of telling Google "hey, these pages are duplicate content, please only look at the main one" (Here is the docs page from GSC for reference)

For example: On the following pages https://www.asyncapi.com/docs/specifications/v2.1.0 https://www.asyncapi.com/docs/specifications/v2.0.0

We will add this tag in the <head>: <link rel="canonical" href="https://www.asyncapi.com/docs/specifications/v2.2.0" />

And over time, Google will know to only crawl the canonical url, which will be the latest version of the spec. I wouldn't use: https://www.asyncapi.com/docs/specifications/latest as it seems to be a redirect. It would be best practice to use the main URL.

*Disclaimer: I am not an SEO expert but these are just things that I learned as a developer at an agency working with an SEO team

Dec 21 '21 16:12 mcturco

And over time, Google will know to only crawl the canonical url, which will be the latest version of the spec. I wouldn't use: https://www.asyncapi.com/docs/specifications/latest as it seems to be a redirect. It would be best practice to use the main URL.

That's right but wrong at the same time. Let me explain. Theoretically, Google won't stop crawling those URLs but will decrease their periodicity. But we don't exactly know, because in fact, Google could consider them equal content or rather different (because they are actually different) and skip those tags.

I think we should spend some time understanding what is currently happening on those paths via the Google Search Console (only the owners of the domain can do unfortunately). That tool shows you information about what URL google decided as canonical, etc etc.

Dec 22 '21 11:12 smoya

@smoya yeah I guess Google's alg in general is a toss up these days 😆 I wonder too if it would consider those pages duplicate content or if it knows to detect that there are minor differences? Maybe yeah due to this we should analyze the GSC before we do anything, agree with you there!

Dec 22 '21 15:12 mcturco

but isn't robot.txt enough? I guess this is still a standard for all bots, where you can tell the bot exactly what pages to ignore. I used it in my previous project -> https://kyma-project.io/robots.txt. It must work as they restructured their docs but forgot to update robots.txt and their docs are not indexed properly.

Jan 03 '22 10:01 derberg

@derberg But you don't want to stop those pages (older spec versions) from being indexed.

Jan 04 '22 12:01 smoya

well this is my main question in the issue title 😄 so I think, we should ignore them, but I'm probably missing some downsides 🤔

Jan 11 '22 09:01 derberg

Let's say I'm a developer of a project that uses AsyncAPI 2.1.0. If I need to check the documentation, I want to quickly get the one for 2.1.0. I will type into google AsyncAPI spec 2.1.0 documentation and I expect a result to drive me to it, not to another version.

Jan 11 '22 10:01 smoya

the thing is that content is almost the same, so when you google AsyncAPI spec 2.1.0 you get 2.0.0. If you mark 2.2 as canonical, then when you google AsyncAPI spec 2.1.0 you will get 2.2.0.

maybe we should experiment with meta tags?

for sure improve description so it is unique per version

<meta name="description" content="AsyncAPI Specification
Disclaimer
Part of this content has been taken from the great work done by the folks at the OpenAPI Initiative. Mainly because it&amp;amp#39;s a great work and we want to keep as mu">

and then also add keywords?

Jan 12 '22 15:01 derberg

the thing is that content is almost the same, so when you google AsyncAPI spec 2.1.0 you get 2.0.0. If you mark 2.2 as canonical, then when you google AsyncAPI spec 2.1.0 you will get 2.2.0.

This is exactly the concern I mentioned few comments above: https://github.com/asyncapi/website/issues/506#issuecomment-999494551

maybe we should experiment with meta tags?

I think we could try it, but I would say in combination of what @mcturco suggested.

and then also add keywords?

You mean on the title? Around the copy? Not sure about the usage of keywords nowadays TBH.

Jan 12 '22 21:01 smoya

This is exactly the concern I mentioned few comments above: #506 (comment)

oh, sorry, now I got it

You mean on the title? Around the copy? Not sure about the usage of keywords nowadays TBH.

yeah, just add version number to keyword, that is it, we have no keywords now, so 🤷🏼

@smoya @mcturco ok, folks, we had a neat discussion here, thanks! let's just try different solutions one by one and monitor with the next reports of Search Console, and let's see 👀

I suggest we first try with metadata as we anyway need to fix description as they are not cool cross spec versions
then we try with canonical link, this one I think is more tricky to add. I mean more complex, as it is not as simple as manually changing just HTML files
if above change nothing we go back to topic of robot.txt and just blocking old versions from indexing

Thoughts?

Jan 13 '22 08:01 derberg

@derberg sounds good to me!

Jan 13 '22 14:01 mcturco

This issue has been automatically marked as stale because it has not had recent activity :sleeping:

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience :heart:

May 14 '22 00:05 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity :sleeping:

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience :heart:

Sep 14 '22 00:09 github-actions[bot]

This issue has been automatically marked as stale because it has not had recent activity :sleeping:

It will be closed in 120 days if no further activity occurs. To unstale this issue, add a comment with a detailed explanation.

There can be many reasons why some specific issue has no activity. The most probable cause is lack of time, not lack of interest. AsyncAPI Initiative is a Linux Foundation project not owned by a single for-profit company. It is a community-driven initiative ruled under open governance model.

Let us figure out together how to push this issue forward. Connect with us through one of many communication channels we established here.

Thank you for your patience :heart:

Apr 14 '23 00:04 github-actions[bot]

still relevant

Apr 15 '23 00:04 smoya

@smoya @derberg currently we have 3.x version so what should we do with this issue??

Mar 30 '24 09:03 sambhavgupta0705

This got solved organically then 😆. But the reality is that will probably hit us again in the future when v4 gets released.

Apr 02 '24 14:04 smoya

closing this for now We will open an issue if this hits again

Apr 20 '24 16:04 sambhavgupta0705

website website copied to clipboard

Should we ignore old spec versions with robot.txt ?

Reason/Context

Description

website
website copied to clipboard