feedback None of the achived blog content that has been successfully published is being indexed by search engines (Google, Bing, etc.)

None of the achived blog content that has been successfully published is being indexed by search engines (Google, Bing, etc.)

Open denbkh opened this issue 1 year ago • 6 comments

Expected Behavior: All content from the URL https://learn.microsoft.com/en-us/archive/blogs/ should be indexed by search engines.

Steps to Reproduce:

Open the URL: https://learn.microsoft.com/en-us/archive/blogs/ntdebugging/troubleshooting-pool-leaks-part-1-perfmon
Use Google or Bing to search for any phrase containing ten words from this page. Use double quotes around the phrase. For instance, search for "Over the years, the NTDebugging Blog has published several articles about pool memory and pool leaks."
Observe that no results from *microsoft.com domains appear in the search results.

Aug 25 '23 16:08 denbkh

Thank you for opening an issue! One of our team members will get back to you with additional information.

If this is a product issue, please close this issue and contact the product's support instead. For a list of support websites, see Support for Microsoft products and apps.

Aug 25 '23 16:08 welcome[bot]

That is by design - all archived pages have this metadata by default:

<meta name="ROBOTS" content="NOINDEX,NOFOLLOW" />

Sep 08 '23 18:09 gewarren

That is by design

What is the purpose of this "design"?

These blogs contain a wealth of valuable information that can benefit a lot of users. However, by disabling web search, you are effectively locking away this valuable knowledge, making it inaccessible to users seeking specific information. Without a search function, users are forced to manually sift through the entire archive to find relevant content. This is highly inefficient and time-consuming. It discourages users from exploring your content, which ultimately undermines the purpose of a blog archive.

Publishing archived blogs without the ability to search is a poor design decision, negatively impacting user experience and content accessibility.

Sep 08 '23 18:09 denbkh

@denbkh I see your point. The problem is that some of the content dates back to 2006 and is likely no longer valid. For content that's still valid, we can enable indexing, but the problem is knowing which of the content is valid. If there are certain articles you know are valid, I can at least enable it on those for now.

Sep 08 '23 19:09 gewarren

@gewarren, thanks! I think instead of manually dividing content into valid and invalid, it might be more efficient to create a simple banner that indicates the content is in the archive and may not be valid. This way, users will be aware of the archive's nature and can assess the content's relevance for themselves.

Sep 08 '23 19:09 denbkh

There is already a banner that says this:

We're no longer updating this content regularly. Check the Microsoft Product Lifecycle for information about how this product, service, technology, or API is supported.

I guess it could be even more explicit that the content is archived, and that the information may be outdated.

@Khairunj What do you think?

Sep 08 '23 22:09 gewarren

feedback feedback copied to clipboard

None of the achived blog content that has been successfully published is being indexed by search engines (Google, Bing, etc.)

feedback
feedback copied to clipboard