aiida-core icon indicating copy to clipboard operation
aiida-core copied to clipboard

Docs: how to stop search engines indexing outdated version of aiida

Open khsrali opened this issue 1 year ago • 19 comments

Right now, if you google for example "aiida tab completion" top results are from version 0.12.1 and 1.0.1.

Note: this issue doesn't only concern about an specific search, but in general to understand how to tell google stop indexing, historical version.

I found this PR on github that they could fix it for their own repo: https://github.com/unitaryfund/mitiq/issues/1526 which apparently there is a robots.txt to be updated/inserted, which btw, they also copied from this repo:) https://github.com/astropy/astropy/blob/main/docs/robots.txt

P.S. unrelated to issue #6508

khsrali avatar Jul 04 '24 09:07 khsrali

Thanks a lot @sphuber for merging #6517, the file is now placed on this https://aiida.readthedocs.io/projects/aiida-core/en/latest/robots.txt url, which still is not discoverable by search engines. Should be https://aiida.readthedocs.io/robots.txt .

See here, they override only if robots.txt is defined in the RTD default. Since we added this in our latest, is not going to override. Do you think it's feasible to change our RTD default to latest?

khsrali avatar Jul 08 '24 07:07 khsrali

See here, they override only if robots.txt is defined in the RTD default. Since we added this in our latest, is not going to override. Do you think it's feasible to change our RTD default to latest?

No we can't unfortunately. We decided some time ago that we do not want to have latest as the default because if we update the docs to change or add new features, as long as the release is not made, the default documentation is "incorrect". So we decided to switch to "stable". I was simply going to make a patch release soon so that the stable docs get updated.

sphuber avatar Jul 08 '24 07:07 sphuber

Makes sense, patch release is even better! Then I close now, if the issue persist later we can reopen.

khsrali avatar Jul 08 '24 09:07 khsrali

Checking after release v2.6.2: https://aiida.readthedocs.io/robots.txt is still the old version. New version is located here: https://aiida.readthedocs.io/projects/aiida-core/en/stable/robots.txt which is not discoverable by search engines.

We need to understand how to move it to the right place.

khsrali avatar Aug 07 '24 10:08 khsrali

https://support.google.com/webmasters/answer/7489871?hl=en#zippy=%2Cthis-is-my-site As google says, this happens when robots.txt is not discoverable Screenshot_20240919_154706

khsrali avatar Sep 19 '24 13:09 khsrali

Historical versions should not be indexed: Screenshot_20240919_154851

khsrali avatar Sep 19 '24 13:09 khsrali

Should have not been indexed if robots.txt was discoverable: Screenshot_20240919_155155

khsrali avatar Sep 19 '24 13:09 khsrali

Screenshot_20240919_160124-1

khsrali avatar Sep 19 '24 14:09 khsrali

Investigated this a bit.

Current readthedocs setup is the following:

  • we have an aiida readthedocs project that does not seem to have any contents itself and the build is failing (https://readthedocs.org/projects/aiida/).
  • aiida-core readthedocs is set up as a subproject of aiida, such that it's served under https://aiida.readthedocs.io/projects/aiida-core/en/stable/
  • modifying robots.txt setup of aiida-core doesn't update the robots.txt of the aiida.readthedocs.io domain, however.
  • we could try to modify robots.txt directly of the aiida project, and that might work.

But I think just having aiida-core as the main project is simpler and makes more sense. Considering also that we don't have any other subprojects. We can also rename aiida-core -> aiida for readthedocs. Therefore, i suggest the following:

  • We remove/delete the current aiida readthedocs project.
  • We set aiida-core as the main project, and potentially changing its "readthedocs name" to aiida. Then the documentation is served at https://aiida.readthedocs.io/en/stable/
  • current robots.txt modification will work directly.
  • If, in the future, we want other readthedocs "subprojects" to be included, we can just include them under this main project.

@khsrali would this work?

eimrek avatar Sep 23 '24 09:09 eimrek

Note, a possible (huge) drawback of what i described above: I think there are direct links to aiida documentation on the internet in the form of "https://aiida.readthedocs.io/projects/aiida-core/en/latest/howto/index.html". if we simplify the url, these will get broken.

pinging also @giovannipizzi

eimrek avatar Sep 23 '24 12:09 eimrek

Thanks a lot @eimrek for writing this up. Your suggestion actually makes sense to me. A bigger issue is that aiida as in readthedocs is building from this repo: https://github.com/aiidateam/aiida-metapkg Which is archived :upside_down_face: so we cannot update that.

Also aiida-core apparently is meant to be a sub-project of aiida in readthedocs. Honestly I'm not aware if we change that what are the consequences, some you already mentioned.

One suggestion could be, we set a redirect in aiida readthedocs from: https://aiida.readthedocs.io/robots.txt to: https://aiida.readthedocs.io/projects/aiida-core/en/stable/robots.txt

Probably this would solve it.

khsrali avatar Sep 24 '24 08:09 khsrali

@khsrali ok, if you're able to get it to work, that's fine.

eimrek avatar Sep 24 '24 08:09 eimrek

@khsrali ok, if you're able to get it to work, that's fine.

Nah, actually redirect didn't work :sob: I've put it there, but still is going to the old robots.txt, I don't really understand.

khsrali avatar Sep 24 '24 09:09 khsrali

@eimrek, yet https://aiida.readthedocs.io/robots.txt is the old one :sob:

khsrali avatar Oct 08 '24 09:10 khsrali

@khsrali Ok, i think i figured it out now. readthedocs built two versions: 'latest' from master and 'stable' from latest tag. The default version was set to 'stable', so https://aiida.readthedocs.io/robots.txt reflected the old, tagged version. I currently just set the default version to 'latest' and now the robots seems to be correct. Feel free to close.

eimrek avatar Oct 08 '24 10:10 eimrek

For me, still showing the old one: :thinking:

image

khsrali avatar Oct 08 '24 10:10 khsrali

Strange. did you try to clear cache / use incognito / use other browser?

eimrek avatar Oct 08 '24 12:10 eimrek

Ah, right! indeed it was a caching issue. Cheers! now robots.txt is in the right place. Let's wait a few days to see if google actually does update the indexes.

khsrali avatar Oct 09 '24 07:10 khsrali

Thanks a lot @eimrek !

khsrali avatar Oct 09 '24 07:10 khsrali