lemmy-ui icon indicating copy to clipboard operation
lemmy-ui copied to clipboard

Search doesn't index (parts of) URLs

Open sia1984 opened this issue 4 years ago • 3 comments

I just accidentally reposted https://www.nngroup.com/articles/computer-skill-levels/ because I searched for it and got this result list:

result list only showing a comment with the URL in it

But the link was already posted, which got shown to me in the repost protection screen:

repost protection

Please make the search also work for URLs and parts of URLs.

sia1984 avatar Jan 30 '21 08:01 sia1984

I don't understand. Its working as it should, the post creation screen is where it should show you if there are reposts, which is what its doing. Why would you want to use the search page, then go to create a post to see if its a repost?

I just double checked for that and its working correctly:

image

I'm not sure what adding url type to the search page would add, considering the only place you care about seeing reposts, is when you're making a post.

dessalines avatar Jan 30 '21 16:01 dessalines

Sorry, that was a stupid example. I basically told you how I found out about the issue, not what the issue is 🤦🏼

It'd be nice if you could search for any URL or domain or part of URLs, e.g. if you want to find all posts linking to your home page or all posts containing spammysite.lol to flag them for moderation.

Other applications include being able to search for fragments of URLs if you remember one but it doesn't appear in the description or title of the post. If you search for "skill levels" the post by Maya above doesn't show up because they changed the title.

Obviously searching for only "www" or ".com" would be stupid, but if you wanted to see articles from example.com posted on the site out of interest or any other reason you could search for it.

Is there a reason URLs are not in the search DB? Obviously they'd have to be tokenised properly so there's no millions of .com entries in the search index but you get "nngroup.com" and "computer skill levels" as searchable strings out of the example. I have no idea on how to do this properly as I know nothing about tokenising.

sia1984 avatar Jan 30 '21 17:01 sia1984

post urls are very much in the database, but there's really no point in adding all the wildcard functionality of a search engine. What you're asking for would require a query language translated to postgres.

dessalines avatar Sep 29 '22 18:09 dessalines