readthedocs.org icon indicating copy to clipboard operation
readthedocs.org copied to clipboard

Search: do not show spam projects on search results

Open humitos opened this issue 2 years ago • 3 comments
trafficstars

Currently, when using the global search, a lot of spam projects are shown in the search results. None of them should be shown on these results.

Screenshot_2023-01-09_18-09-04

We could define a new threshold, RTD_SPAM_THRESHOLD_DONT_SHOW_SEARCH_RESULTS to skip these:

https://github.com/readthedocs/readthedocs.org/blob/2f009cd520d8ad8a4c783ac091e5c32b88fc9342/readthedocs/settings/base.py#L993-L998

humitos avatar Jan 16 '23 11:01 humitos

The easiest way to archive this without having to introduce a new field or have to keep a track of projects to filter by (which probably won't scale) is to just have a task that removes the files from the index when a project is marked as spam or has a score greater than x. If there was a mistake, we can just trigger the re-index task after un-marking the project as spam.

stsewd avatar Sep 13 '23 19:09 stsewd

Can this be closed? I think since this was opened, we've leaned more towards deprioritizing this UI in favor of bringing some of these features to our in-doc, Addons search instead. This UI doesn't get much use and global project search is community specific.

agjohnson avatar Feb 29 '24 19:02 agjohnson

If we are not going to do this work or similar, I'd propose to kill this view completely then. At this point, I think this is just bad UX and exposes the search as a broken and useless feature; degrading its trust.

humitos avatar Mar 04 '24 10:03 humitos

We hit this recently, so we should find a good way to move forward here. The goal should be removing these projects from the search index, because it will make our search faster, and not take up space. This could also include work on #11533, if that makes sense as well?

ericholscher avatar Aug 13 '24 16:08 ericholscher

Also a note, we are probably still not talking about trying to tune the global search UI, as that view is going away with the new dashboard. The work here is indeed solely working to avoid indexing and surfacing search results.

There is some slightly related work in not surfacing spam projects in our dashboard as well:

  • https://github.com/readthedocs/readthedocs-ext/issues/554

agjohnson avatar Aug 13 '24 16:08 agjohnson