Docs: how to stop search engines indexing outdated version of aiida
Right now, if you google for example "aiida tab completion" top results are from version 0.12.1 and 1.0.1.
Note: this issue doesn't only concern about an specific search, but in general to understand how to tell google stop indexing, historical version.
I found this PR on github that they could fix it for their own repo:
https://github.com/unitaryfund/mitiq/issues/1526
which apparently there is a robots.txt to be updated/inserted, which btw, they also copied from this repo:)
https://github.com/astropy/astropy/blob/main/docs/robots.txt
P.S. unrelated to issue #6508
Thanks a lot @sphuber for merging #6517, the file is now placed on this https://aiida.readthedocs.io/projects/aiida-core/en/latest/robots.txt url, which still is not discoverable by search engines. Should be https://aiida.readthedocs.io/robots.txt .
See here, they override only if robots.txt is defined in the RTD default. Since we added this in our latest, is not going to override.
Do you think it's feasible to change our RTD default to latest?
See here, they override only if robots.txt is defined in the RTD default. Since we added this in our latest, is not going to override. Do you think it's feasible to change our RTD default to latest?
No we can't unfortunately. We decided some time ago that we do not want to have latest as the default because if we update the docs to change or add new features, as long as the release is not made, the default documentation is "incorrect". So we decided to switch to "stable". I was simply going to make a patch release soon so that the stable docs get updated.
Makes sense, patch release is even better! Then I close now, if the issue persist later we can reopen.
Checking after release v2.6.2:
https://aiida.readthedocs.io/robots.txt is still the old version.
New version is located here:
https://aiida.readthedocs.io/projects/aiida-core/en/stable/robots.txt
which is not discoverable by search engines.
We need to understand how to move it to the right place.
https://support.google.com/webmasters/answer/7489871?hl=en#zippy=%2Cthis-is-my-site
As google says, this happens when robots.txt is not discoverable
Historical versions should not be indexed:
Should have not been indexed if robots.txt was discoverable:
Investigated this a bit.
Current readthedocs setup is the following:
- we have an
aiidareadthedocs project that does not seem to have any contents itself and the build is failing (https://readthedocs.org/projects/aiida/). aiida-corereadthedocs is set up as a subproject ofaiida, such that it's served under https://aiida.readthedocs.io/projects/aiida-core/en/stable/- modifying
robots.txtsetup ofaiida-coredoesn't update therobots.txtof theaiida.readthedocs.iodomain, however. - we could try to modify
robots.txtdirectly of theaiidaproject, and that might work.
But I think just having aiida-core as the main project is simpler and makes more sense. Considering also that we don't have any other subprojects. We can also rename aiida-core -> aiida for readthedocs. Therefore, i suggest the following:
- We remove/delete the current
aiidareadthedocs project. - We set
aiida-coreas the main project, and potentially changing its "readthedocs name" toaiida. Then the documentation is served at https://aiida.readthedocs.io/en/stable/ - current
robots.txtmodification will work directly. - If, in the future, we want other readthedocs "subprojects" to be included, we can just include them under this main project.
@khsrali would this work?
Note, a possible (huge) drawback of what i described above: I think there are direct links to aiida documentation on the internet in the form of "https://aiida.readthedocs.io/projects/aiida-core/en/latest/howto/index.html". if we simplify the url, these will get broken.
pinging also @giovannipizzi
Thanks a lot @eimrek for writing this up. Your suggestion actually makes sense to me.
A bigger issue is that aiida as in readthedocs is building from this repo: https://github.com/aiidateam/aiida-metapkg
Which is archived :upside_down_face: so we cannot update that.
Also aiida-core apparently is meant to be a sub-project of aiida in readthedocs. Honestly I'm not aware if we change that what are the consequences, some you already mentioned.
One suggestion could be, we set a redirect in aiida readthedocs
from:
https://aiida.readthedocs.io/robots.txt
to:
https://aiida.readthedocs.io/projects/aiida-core/en/stable/robots.txt
Probably this would solve it.
@khsrali ok, if you're able to get it to work, that's fine.
@khsrali ok, if you're able to get it to work, that's fine.
Nah, actually redirect didn't work :sob:
I've put it there, but still is going to the old robots.txt, I don't really understand.
@eimrek, yet https://aiida.readthedocs.io/robots.txt is the old one :sob:
@khsrali Ok, i think i figured it out now. readthedocs built two versions: 'latest' from master and 'stable' from latest tag. The default version was set to 'stable', so https://aiida.readthedocs.io/robots.txt reflected the old, tagged version. I currently just set the default version to 'latest' and now the robots seems to be correct. Feel free to close.
For me, still showing the old one: :thinking:
Strange. did you try to clear cache / use incognito / use other browser?
Ah, right! indeed it was a caching issue.
Cheers! now robots.txt is in the right place.
Let's wait a few days to see if google actually does update the indexes.
Thanks a lot @eimrek !