mwmbl icon indicating copy to clipboard operation
mwmbl copied to clipboard

A way to submit sites to index

Open songproducer opened this issue 4 years ago • 8 comments

Something like the Firefox extension but for iOS

songproducer avatar Mar 28 '22 18:03 songproducer

Orion Browser has ported some of the Web Extensions API to WebKit. I am really curious if that includes the ones used by mwmbl crawler extension. You can get the beta through TestFlight.

This may not be the right approach, but it looks really interesting.

ndren avatar Jun 11 '22 13:06 ndren

I would like to add the ability to add sites to be indexed. It needs to be done carefully though to prevent spam. My current thought is that a site gets submitted and then must be approved by a moderator.

daoudclarke avatar Jul 04 '22 08:07 daoudclarke

I don't know if the projet needs more URLs to crawl, I bet there's already plenty, but here's some resources that might be usefull:

  • https://github.com/tb0hdan/domains
  • https://github.com/etalab/noms-de-domaine-organismes-publics

JulienPalard avatar Jul 10 '22 18:07 JulienPalard

This is a feature we discussed about adding to the extension in the future. Just like @daoudclarke said, the main concern is about validating the submitted sites. If you have any ideas on how to tackle this you are free to propose solutions.

ColinEspinas avatar Jul 11 '22 21:07 ColinEspinas

My current idea for this is that we have some designated moderators that can approve suggestions. But first we need to come up with a policy for how to define/know what is a good site to crawl

daoudclarke avatar Jul 11 '22 22:07 daoudclarke

You can triage the suggested URLs with ad/spam/malware filter lists, like EasyList and AdGuard lists.

brunexgeek avatar Apr 07 '23 21:04 brunexgeek

@brunexgeek - thanks those look like good lists

daoudclarke avatar Apr 08 '23 04:04 daoudclarke

Perhaps the existing web extension can be used to insert small panels into footers, asking the user if they'd like to crawl this website. This can help with the goal of growing the index while avoiding the spam problem.

aparatext avatar Jul 01 '23 23:07 aparatext