searx-instances icon indicating copy to clipboard operation
searx-instances copied to clipboard

Feature: github bot to scan the new instance request.

Open dalf opened this issue 4 years ago • 5 comments

As soon there is a new instance request, a bot can:

  • check if filtron is configured.
  • returns the result of searx-stats2 for this instance.
  • "a freenom domain you don't own the domain: https://www.reddit.com/r/webhosting/comments/82ksmp/freenom_took_my_free_domain_away/" ( from https://github.com/dalf/searx-instances/issues/74#issuecomment-662356353 )

The bot can also scan for a comment like "@searx-bot add instance" (comment from the project maintainers) and add the instance automatically.

But once implemented, what is the value of a human review ?

dalf avatar Jul 16 '20 08:07 dalf

That's a good idea! The bot could also technically check for some others components like TLS, IPv6.

But I think there should always be a human review. Maybe we could implement something like "/lgtm" command only accessible to the collaborators and we could for example require two LGTM from the collaborators in order to have the instance merged. If the bot check that there are two LGTM then it automatically add the instance.

unixfox avatar Jul 16 '20 08:07 unixfox

What's the programming language that you want to have for this bot?

unixfox avatar Sep 07 '20 14:09 unixfox

I've started to write https://github.com/dalf/botsandbox in Python (more experimental than anything else)

  • based on https://gidgethub.readthedocs.io/en/latest/
  • triggered by a WebHook
  • use Personnal Access Token for @searx-bot (The workflow on https://docs.github.com/en/developers/apps/about-apps#determining-which-integration-to-build says it should be a github app ?)

The idea is:

  • the bot runs on check.searx.space to avoid issues with filtron / antibot-proxy / whatever on the searx instances (if the bot can't scan the instance, then searx.space too).
  • if there is a new instance:
    • it queues for a scan using searx-stats2 using Celery and redis.
    • it is just a function call from the Python point of view.
    • one scan at a time (if for some reason there is some spam on the searx-instances issue tracker, the bot will be able to deal with it).
    • some additional tests (filtron, DNSSEC)

probot for node seems cleaner, but

  • it adds a new language (golang would be okay, but I'm not sure this is the right language for that purpose).
  • as I understand the only to call Python from node, is to spawn a new process:
    • to scan an instance (call searx-stats2)
    • to add / remove an instance from searx-instance (call searx-instances).

dalf avatar Sep 07 '20 15:09 dalf

I'm fine with python even though my main preferred language is JavaScript.

Webhook is a good idea. Github apps is probably better because Personnal Access token gives too much access through your account if somehow it gets leaked.

If the bot is open source, how are the contributors going to test it if it's needed to send requests from check.searx.space?

One idea that I've: A temporary environment for each PR that is run using the IP of check.searx.space like Gitlab is already doing with Review Apps, see an example here for websites: https://youtu.be/h2pv_syqO24?t=110. Each commit gets a new temporary environment so that the developer can test each new changes. You don't even need to run the python app on your VPS, you could like to run it on a separate server (in docker) that use check.searx.space server as a proxy.

I don't know how to do that, is there some kind of apps that already do this kind of thing?

unixfox avatar Sep 07 '20 16:09 unixfox

About a GitHub App run on runner, but self Hosted Runner are discouraged on public repository : https://docs.github.com/en/actions/hosting-your-own-runners/about-self-hosted-runners#self-hosted-runner-security-with-public-repositories

Network connection from Github runner:

  • Proxy: it won't work for CSP and TLS grade, so I guess it is no go. Details:

    • https://github.com/mozilla/http-observatory : there is proxy configuration, even if some hack is possible, but I would like to avoid that.
    • https://github.com/aeris/cryptcheck : there is no proxy configuration, written in Ruby tightly couple to OpenSSL.
  • A Wireguard connection to check.searx.space, but the private key can't be shared otherwise it provides a open proxy to anyone: so only for the master branch, not for the PR.

So whatever the solution I can think of, only the master branch of the bot could run the tests on check.searx.space; for the PRs and forks, the code will run on github runner.

But even it is GitHub App, it requires a test environment:

  • either a repository.
  • either some tools to simulate Github behaviors.

dalf avatar Sep 08 '20 07:09 dalf