araa-search icon indicating copy to clipboard operation
araa-search copied to clipboard

Plans for adding new search engines

Open amogusussy opened this issue 1 year ago • 21 comments

In #103, you mentioned that in order to prevent no results being returned because of rate limiting, you'll implement other search engines to act as a backup.

Here's a template that I've came up with for the results:

{
  "wiki": {
    "title": "String",
    "description": "String",
    "link": "String",
    "image": "String"
  },
  "results": [
    {
      "title": "String",
      "description": "String",
      "link": "String",
      "has_sublinks": Bool,
      "sublinks": [
        {
          "title": "String",
          "description": "String",
          "link": "String"
        }
      ]
    }
  ]
}

Then we can just edit the results.html file to use this format, and it'll be much easier to implement newer engines.

Each engine should have its own file in src/textEngines/{engine}.py, and then get called in the textResults.py file.

amogusussy avatar Jan 01 '24 17:01 amogusussy

honestly i was just using the search thing then google blocked it its very annoying i should try to fix this asap

Extravi avatar Jan 02 '24 01:01 Extravi

image

Extravi avatar Jan 02 '24 01:01 Extravi

it was working before that image

Extravi avatar Jan 02 '24 02:01 Extravi

i did find this post by 2captcha not long ago and it has some useful info there is one more post but i cannot find it atm image https://2captcha.com/blog/google-sepr-recaptcha-june-2022

Extravi avatar Jan 02 '24 02:01 Extravi

i know it said something about sending request with cookies like the "NID" cookie etc and how the url should look to avoid getting detected

Extravi avatar Jan 02 '24 02:01 Extravi

you would need a web driver to capture thos cookies so you can send it in request

Extravi avatar Jan 02 '24 02:01 Extravi

I know that my instance processes 9.3k uncached requests, like searches, images, etc., every 24 hours, but as this project grows and I start to process more requests, it's going to get less reliable, so I'll need to try my best to work on making it much more reliable over time.

Extravi avatar Jan 02 '24 02:01 Extravi

I do not record any logs on my instance, but the Cloudflare proxy logs the number of requests, not the request made.

Extravi avatar Jan 02 '24 02:01 Extravi

Also, because I have one server in Germany, it needs to be behind a CDN for both speed and security reasons.

Extravi avatar Jan 02 '24 02:01 Extravi

Also, I'm sure using Cloudflare is fine; I don't really see it as a privacy concern. It's definitely better than sending requests directly to Google. Cloudflare offers a free version of their proxy to test updates before pushing them to their paid clients and users. I don't think Cloudflare records the request made through their proxies, and I'm sure that isn't legal. I think they use free users to test the updates made to their proxy or CDN service, so it won't impact the paid clients or companies that use them.

Extravi avatar Jan 02 '24 02:01 Extravi

Also, because I have one server in Germany, it needs to be behind a CDN for both speed and security reasons.

I have one good server in Germany behind a CDN because it's more cost-efficient, and I have more server resources available for that instance as a result. 4 cores, 8 GB of RAM. Then the network traffic is optimized using Cloudflare. The only thing that seems slow for me is autocomplete here in Canada, but I might have a fix for that soon.

Extravi avatar Jan 02 '24 02:01 Extravi

im adding support for a captcha solver so i can use it on my instance

image

Extravi avatar Jan 02 '24 04:01 Extravi

in my test it cost like 1 cent pre captcha and thats only when the abuse cookie expires

Extravi avatar Jan 02 '24 04:01 Extravi

so it would last a very long time and cost less then any api would

Extravi avatar Jan 02 '24 04:01 Extravi

this will be a setting people can use in the config if they want to enable captcha solver support with their own api key

Extravi avatar Jan 02 '24 04:01 Extravi

im adding support for a captcha solver so i can use it on my instance

image

https://github.com/Extravi/araa-search/commit/bf83d6bf325d72e7ab9b597cb25e4bf3b68d66fc

its been added

Extravi avatar Jan 02 '24 07:01 Extravi

i added the captcha solver but it has an issue it does not work with more then one worker because the variables are not shared in memory until i find a way to fix it i will only be using one worker on my instance

Extravi avatar Jan 02 '24 08:01 Extravi

if you know how to share the variables for the captcha solver please let me know or open a pull request for it

Extravi avatar Jan 02 '24 08:01 Extravi

one worker and 1 thread

Extravi avatar Jan 02 '24 08:01 Extravi

i have added the captcha solver to Araa fixed all bugs with it and its active and running on my instance

Extravi avatar Jan 03 '24 02:01 Extravi

its bug free and works with all workers

Extravi avatar Jan 03 '24 02:01 Extravi