searxng
searxng copied to clipboard
[Feature] Mullvad Leta
Working URL to the engine https://leta.mullvad.net/
Why do you want to add this engine? Mullvad is a reputable VPN provider. If I could have my local instance also be a proxy to this search engine, that would reduce the number of devices I need connected to Mullvad to run to access this search feature.
Features of this engine The blog post here contains the features and FAQ. Their meta search engine uses the official paid Google API, so results are likely quite robust and not error prone to changes.
Technically Mullvad Leta is already a meta search engine on Google - I wonder if they use searx lol (probably not, they use NodeJS).
How can SearXNG fetch the information from this engine? TBD - will likely need to scrape the front end
Applicable category of this engine general
Additional context You MUST have a Mullvad account to use this search engine. No account, no searches.
There is a limit on 100 non-cached searches (or credits) per valid Mullvad VPN account per 24 hours. You can search the cache without using credits.
Notes I plan on working on this integration when I can, as it directly benefits myself! But anyone else can work on it too.
Additional and pretty important info:
What is a cached search?#
We store every search in a RAM based cache storage (Redis), which is removed after it reaches over 30 days in age.
Cached searches are fetched from this storage, which means we return a result that can be from 0 to 30 days old. It may be the case that no other user has searched for something during the time that you search, which means you would be shown a stale result.
Also this:
Each time you search for a phrase you use up 1 of the 100 credits.
Each time you select next page you use up 1 of the 100 credits.
If you select ”Only search in cache”, which is the default option, 0 credits are used.
-
multipart post request to https://leta.mullvad.net/login,
account_number=<account number without spaces>, q=<empty>, oc=<empty>
(presumably both can be unset but i haven't checked). Response is json{"type":"redirect","status":302,"location":"/"}
(funnily http code is 200) and a set-cookie withaccessToken=...; Path=/; HttpOnly; Secure; SameSite=Strict
-
https://leta.mullvad.net/__data.json should return a json with
- type="data"
- nodes= a list of objects with
- type="data"
- uses = some unknown object, can have no fields or have "url" set to 1, from reverse engineering the js it can have "dependencies" (array), "params" (array), "parent" (bool), "route" (bool), "url" (bool). I guess this is just some framework
- data = an array, i've seen it set to
- a single element set to null
- first element = an object with connected=1, and the second element is just true
- first element = an object with isLoggedIn=1, and the second element is just true
Either way, this json seems useless for searx, although it can be used to check isLoggedIn
- send multipart request to https://leta.mullvad.net/,
q=<query>
, optional oc=on to only search cache (if oc is off it isnt present at all), optionalgl=<two lowercase letters specifying the region>
(it isnt present when doing a search from the landing page, but is set to an empty string when doing a search from the search results page without specifying a country)
@glanham-jr if you or anyone else wants to work on this,
please feel free to pick up where i left off, i'm not sure if i'll ever come back to this- but here's a python script to get the search results in JSON format and print them in the terminal:
[repo deleted]
Don't forget to read my instructions!
@Hackurei I have plans to work on this in the next coming months (life has been quite busy lately) - thanks for the debug info!
If this engine gets implemented, should make sure to mention in the config/docs that it should probably not be exposed publicly since search tokens are finite.
If this engine gets implemented, should make sure to mention in the config/docs that it should probably not be exposed publicly since search tokens are finite.
Not necessarily. As according to the document...
If you select ”Only search in cache”, 0 credits are used.
I would say if you have a public engine, then ensuring "only search in cache" arguably must be checked, otherwise it will go down. I'd probably enable this by default, and home users can quickly just enable it as needed.
@Hackurei Update - looks like there is no longer a login option :cry:
Looks like they saw this would have been an issue. I tried the above script that worked previously and got a 204
response. The login page doesn't exist anymore. Meaning, we would have to run searxng
within the VPN. Might still be useful, but not sure if Mullvad would block public instances for too many requests.