qiskit-metapackage icon indicating copy to clipboard operation
qiskit-metapackage copied to clipboard

Discourage search engines from indexing old documentation builds

Open garrison opened this issue 2 years ago • 8 comments

Often, outdated versions of the qiskit documentation rank higher in search engine results than current documentation. This is especially noticable with #1389, as many of these pages are broken links, currently.

What is the expected behavior?

We should try to prevent outdated documentation from appearing in search results. This can be improved by adding <meta name="robots" content="noindex"> to the <head> of each page in each old documentation build.

garrison avatar Dec 23 '21 16:12 garrison

As one example, the first hit that comes up when I search Google for "qiskit BasePauli" is to the 0.25 documentation (the Japanese version of it, in fact).

garrison avatar Dec 23 '21 16:12 garrison

Yes, you are correct @garrison; this occurs frequently and causes a great deal of confusion. When I googled "support vector machine qiskit" the other day, the tutorial for version 0.24.1 was the first result; instead, it should point to the latest tutorials for version 0.34.1.

bopardikarsoham avatar Jan 11 '22 06:01 bopardikarsoham

Related: there is an outdated build of the qiskit-optimization documentation that ranks high in search results at [1], without any indication in the URL that it might not be the latest version. The actual latest is at [2].

  1. https://qiskit.org/documentation/tutorials/optimization/index.html
  2. https://qiskit.org/documentation/optimization/tutorials/index.html

garrison avatar Jan 20 '22 22:01 garrison

The indexing of old documentation should hopefully sort itself out given a little bit of time now, just by nature of how the algorithm works; paths at qiskit.org/documentation/* will be updated far more frequently than qiskit.org/documentation/stable/*, which will cause them to be promoted to the top. Now that the deployment to stable should be working correctly, we won't get stuck with the same pages getting hit a lot from search results, because they'll keep getting shifted down by newer versions.

I don't think adding a noindex is the best idea here - that will seriously hinder people from being able to find older versions of the documentation at all, even when they're specifically looking for them, which is important for people using older versions of Qiskit (common in education). Perhaps a better solution would be to try putting in a banner like Numpy/Scipy/Matplotlib do, which says "this is an older version of the documentation, click here to go to the new documentation". That will be a bit trickier to set up, but should work in principle.

The two problems mentioned in the comments are slightly different to the original one:

  • "support vector machine qiskit" goes to old documentation because that tutorial seems to have been removed from new documentation in qiskit_machine_learning, so there isn't a newer link
  • the optimisation tutorials look to be a bug in how the docs are built and pushed to product; they should have redirects set in the Sphinx configuration, but those don't actually seem to be manifesting themselves on the website, and still point to old versions of the files. I'm not 100% sure why that's the case, though. The index.html redirect seems to be configured incorrectly (it shouldn't include the extension), but I'm not sure why the individual notebooks don't redirect, nor why our rclone sync isn't clearing out the old files.

jakelishman avatar Jan 21 '22 09:01 jakelishman

On further investigation, I think I know what the issue with the optimisation (and other) tutorials is that the deployment script effectively does rclone sync --exclude 'optimization/*' <src> <dst> to avoid clobbering the documentation generated by the qiskit-optimization repository actions themselves. Unfortunately, that also means that the redirects created in tutorials/optimization are also excluded, so the whole directory isn't synced, and we're left with the old version still deployed to production.

Previously I was only looking at the stable/ deploy script, thinking that they had essentially equal rclone commands, but that's not the case.

jakelishman avatar Jan 21 '22 11:01 jakelishman

On further investigation, I think I know what the issue with the optimisation (and other) tutorials is that the deployment script effectively does rclone sync --exclude 'optimization/*' <src> <dst> to avoid clobbering the documentation generated by the qiskit-optimization repository actions themselves. Unfortunately, that also means that the redirects created in tutorials/optimization are also excluded, so the whole directory isn't synced, and we're left with the old version still deployed to production.

Previously I was only looking at the stable/ deploy script, thinking that they had essentially equal rclone commands, but that's not the case.

So it was an HTML issue, @jakelishman 😄

bopardikarsoham avatar Jan 21 '22 16:01 bopardikarsoham

This particular part of it wasn't an HTML issue as such - we already had all the right files generated, the problem was we were just excluding some of them from being synced from the build server to the production server, which was leaving old stuff in place. If you go to https://qiskit.org/documentation/tutorials/optimization now, you should get 301 redirected to https://qiskit.org/documentation/optimization/tutorials (you might need to reload with cache disabled). Search engines will pick up the permanent redirect in the next few days, so that part should be solved.

The other issue, about the documentation in the stable/0.33 (etc etc) paths showing up in search results, is still there, though I mentioned above why I'm hopeful it should largely sort itself out fairly quickly now. I do think it would be good to put some sort of notification on the old pages, mentioning that they're not the latest version of the documentation - while I'm open to all suggestions, I am a little worried that noindexing all the old stuff will make it harder for people who need to find old versions, and I think we might be able to get the best of both worlds with a banner and a link.

jakelishman avatar Jan 21 '22 18:01 jakelishman

Before filing this issue, I did a quick survey of some other projects that I admire for having excellent documentation, and I noticed that some have the noindex <meta> tag while others do not. I think you make a reasonable case that it may not be desirable, so I support taking a "wait and see" approach to see if other changes fix the problem of old docs outranking new ones in the search engines.

I really like the idea of having a banner. Many other projects (e.g. Julia, Django, and projects hosted on readthedocs) also have a selector on the page where you can easily switch to any other version of the documentation. Of those, Django is the only one that links to the corresponding page of the documentation, rather than the front page of it, when switching to a different version. I find this especially convenient. An example of this in action is at https://docs.djangoproject.com/en/dev/intro/install/

garrison avatar Jan 21 '22 20:01 garrison