searxng icon indicating copy to clipboard operation
searxng copied to clipboard

[Feature] Mullvad Leta

Open glanham-jr opened this issue 1 year ago • 8 comments

Working URL to the engine https://leta.mullvad.net/

Why do you want to add this engine? Mullvad is a reputable VPN provider. If I could have my local instance also be a proxy to this search engine, that would reduce the number of devices I need connected to Mullvad to run to access this search feature.

Features of this engine The blog post here contains the features and FAQ. Their meta search engine uses the official paid Google API, so results are likely quite robust and not error prone to changes.

Technically Mullvad Leta is already a meta search engine on Google - I wonder if they use searx lol (probably not, they use NodeJS).

How can SearXNG fetch the information from this engine? TBD - will likely need to scrape the front end

Applicable category of this engine general

Additional context You MUST have a Mullvad account to use this search engine. No account, no searches.

There is a limit on 100 non-cached searches (or credits) per valid Mullvad VPN account per 24 hours. You can search the cache without using credits.

Notes I plan on working on this integration when I can, as it directly benefits myself! But anyone else can work on it too.

glanham-jr avatar Jun 21 '23 21:06 glanham-jr

Additional and pretty important info:

What is a cached search?#

We store every search in a RAM based cache storage (Redis), which is removed after it reaches over 30 days in age.

Cached searches are fetched from this storage, which means we return a result that can be from 0 to 30 days old. It may be the case that no other user has searched for something during the time that you search, which means you would be shown a stale result.

unixfox avatar Jun 21 '23 22:06 unixfox

Also this:

Each time you search for a phrase you use up 1 of the 100 credits.

Each time you select next page you use up 1 of the 100 credits.

If you select ”Only search in cache”, which is the default option, 0 credits are used.

glanham-jr avatar Jun 22 '23 02:06 glanham-jr

  1. multipart post request to https://leta.mullvad.net/login, account_number=<account number without spaces>, q=<empty>, oc=<empty> (presumably both can be unset but i haven't checked). Response is json {"type":"redirect","status":302,"location":"/"} (funnily http code is 200) and a set-cookie with accessToken=...; Path=/; HttpOnly; Secure; SameSite=Strict

  2. https://leta.mullvad.net/__data.json should return a json with

  • type="data"
  • nodes= a list of objects with
    • type="data"
    • uses = some unknown object, can have no fields or have "url" set to 1, from reverse engineering the js it can have "dependencies" (array), "params" (array), "parent" (bool), "route" (bool), "url" (bool). I guess this is just some framework
    • data = an array, i've seen it set to
      • a single element set to null
      • first element = an object with connected=1, and the second element is just true
      • first element = an object with isLoggedIn=1, and the second element is just true

Either way, this json seems useless for searx, although it can be used to check isLoggedIn

  1. send multipart request to https://leta.mullvad.net/, q=<query>, optional oc=on to only search cache (if oc is off it isnt present at all), optional gl=<two lowercase letters specifying the region> (it isnt present when doing a search from the landing page, but is set to an empty string when doing a search from the search results page without specifying a country)

chayleaf avatar Jun 25 '23 14:06 chayleaf

@glanham-jr if you or anyone else wants to work on this,

please feel free to pick up where i left off, i'm not sure if i'll ever come back to this- but here's a python script to get the search results in JSON format and print them in the terminal:

[repo deleted]

Don't forget to read my instructions!

Austin-Olacsi avatar Sep 22 '23 19:09 Austin-Olacsi

@Hackurei I have plans to work on this in the next coming months (life has been quite busy lately) - thanks for the debug info!

glanham-jr avatar Sep 23 '23 20:09 glanham-jr

If this engine gets implemented, should make sure to mention in the config/docs that it should probably not be exposed publicly since search tokens are finite.

sevmonster avatar Oct 30 '23 02:10 sevmonster

If this engine gets implemented, should make sure to mention in the config/docs that it should probably not be exposed publicly since search tokens are finite.

Not necessarily. As according to the document...

If you select ”Only search in cache”, 0 credits are used.

I would say if you have a public engine, then ensuring "only search in cache" arguably must be checked, otherwise it will go down. I'd probably enable this by default, and home users can quickly just enable it as needed.

glanham-jr avatar Nov 04 '23 18:11 glanham-jr

@Hackurei Update - looks like there is no longer a login option :cry:

Looks like they saw this would have been an issue. I tried the above script that worked previously and got a 204 response. The login page doesn't exist anymore. Meaning, we would have to run searxng within the VPN. Might still be useful, but not sure if Mullvad would block public instances for too many requests.

glanham-jr avatar Jan 06 '24 18:01 glanham-jr