searx-space icon indicating copy to clipboard operation
searx-space copied to clipboard

Response times are server based and not based on the location of the visitor

Open unixfox opened this issue 5 years ago • 10 comments

Currently, the response times are retrieved from only one server so from one location in the world. Thus, some Searx instances could actually have a bad ranking because they are far away from the server that serve the statistics.

One way to fix that would be to test every Searx instances from multiple locations. The problem is that renting a server running all the time in multiple locations in the world is expensive.

So I think the best way to test each Searx instance without paying anything would be to use the CDN of Zeit. They have quite a lot of servers in different locations of the world: https://zeit.co/docs/v2/network/regions-and-providers/ We would just have to implement the test into a lambda function.

unixfox avatar Dec 13 '19 19:12 unixfox

I found a really cheap hosting provider that offer VPS for only $2/year: https://hosting.gullo.me/pricing The VPS are IPv6 compatible, the latest version of Debian, locations in New York, Chicago, Los Angeles, Canada, Germany and Finland. There is unfortunately no location in Asia nor Africa nor Oceania on this provider.

But according to serverhunter, it seems that there are other cheap VPS providers for the 3 missing regions:

  • Asia : https://www.serverhunter.com/?search=562-61E-19C
  • Oceania : https://www.serverhunter.com/?search=4F4-0F9-06A
  • https://www.serverhunter.com/?search=477-8F2-F54

unixfox avatar Jan 16 '20 23:01 unixfox

I agree with this point. Thank you for all the suggestions.

I wish the configuration to be the same among the different locations:

  • as I remember, the response time seems to be more or less linked to the CPU when httpx (or requests) is used.
  • I choiced httpx because it seems to be the successor of the requests package, so it is a way to try the package before using it in searx (or not). aiohttp may solve this issue
  • I know without a benchmark, this comment is not very useful.

About mesures from different locations:

  • searx.space can monitor everything, and different servers on different locations only check the response time.
  • searx-stats2 has to consolidate the different results:
    • either using scp or whatever.
    • either searx-stats2 implements a server.
  • searx-stats2 takes less than one hour to run, so it should be possible to : allocate the server, install searx-stats2, measure response time, send the results, shutdown the server. Let's say it takes 2 hours.
  • if this is done in 5 different locations, once a day, it is about 305 hours per month (about 42% of a full month for one server).

Amazon EC2 seems to offer 750 hours for t2.micro (?)

dalf avatar Jan 21 '20 13:01 dalf

Yeah AWS EC2 seems to be a good target for that, but they offer only one year of free t2.micro. After that you are free to create as many accounts as you want as long as you have a credit card still unused on their services.

unixfox avatar Jan 21 '20 13:01 unixfox

Note about the VPS choice:

  • t2.micro seems enough. The cost estimator says about $4 per month after the free tier, less than $50 per year.
  • I have to admit I'm curious about AWS.
  • I'm a "little" lost with all the other choices.

Other notes:

  • Most probably, whatever the hosting solution, each instance can't provide a free pass for the tests (bypass filtron), since a bunch of unknown set of different IPs will benchmark the instances.
  • Another related question: what is the response time ? For now, the list is sorted by initial response time, but most probably a query with the default engines would be more meaningful ?

dalf avatar Jan 22 '20 10:01 dalf

most probably, whatever the hosting solution, each instance can't provide a free pass for the tests (bypass filtron).

Isn't that already the case? I mean apart from my antibot it's the only one that whitelist your IP.

Maybe we could introduce a token that is passed in the HTTP headers so that the anti bot solution knows that it's searx-stats2?

unixfox avatar Jan 23 '20 01:01 unixfox

Now that we control the antibot solution in searxng, maybe we could have a way to bypass the limiter in order to do tests from multiple IP addresses around the world.

This could be a public text file on the searx-space repository with all the whitelisted IP addresses. Searxng would refresh this list every X hours and add them in the redis database.

For the searx(ng) instances that don't use the builtin limiter we would just test them normally from a single IP address.

Now that we have donations we can buy multiple VPS servers around the world.

unixfox avatar Jul 26 '22 10:07 unixfox

Dumb idea using JS (since the website requires JS anyway): Add a button Find the fastest(*) instance for me in searx.space

The button make few requests to front pages of each selected instances ; then it sort the table according to the response times.

The Resource timing API can help to do that.

It will never be as accurate as a constant measure of the result time, but I hope it can be a good approximation.

An possible improvement of the measure: with the agreement of the user, the JS store the response times in the local storage:

  • next time, the order is kept
  • another press on the button can update the measure (I mean merge the new the result and previous result, like = (1*new + 0.5*old + 0.25*even older)/(1.75)

dalf avatar Sep 04 '23 21:09 dalf

@dalf you can't do that due to CORS

unixfox avatar Sep 04 '23 21:09 unixfox

This is workaround: a script to run locally

curl https://gist.githubusercontent.com/dalf/66f8962460048d8d5a6d9b4eaeab197a/raw/a9c389d1721723ed1491b78b7bb7603f528bf4f9/findmyinstance.py | python

See : https://gist.github.com/dalf/66f8962460048d8d5a6d9b4eaeab197a

The scripts makes a few requests on the front page of each instances, and then displays the median and mean response times. Far from perfect, but it will gives an idea of the response times without making guesses.

It relies on Python, but there is no external dependency and should work on whatever OS.

dalf avatar Sep 09 '23 22:09 dalf

Sorry if this is a dumb suggestion, but why don't we just subtract the ping time from the response times, to get the underlying time.

jazzzooo avatar Sep 20 '23 15:09 jazzzooo