searxng icon indicating copy to clipboard operation
searxng copied to clipboard

Bug: wikidata engine

Open Marvilius opened this issue 3 years ago • 13 comments

Version of SearXNG, commit number if you are using on master branch and stipulate if you forked SearXNG Repository: https://github.com/searxng/searxng Branch: master Version: 2022.06.06-ea0cddba

How did you install SearXNG? docker container

What happened?

After every searx How To Reproduce

Immanent Expected behavior

Wiki entries shown Screenshots & Logs

Additional context

Technical report

Error

  • Error: httpx.TimeoutException
  • Percentage: 100
  • Parameters: (None, None, None)
  • File name: searx/search/processors/online.py:96
  • Function: _send_http_request
  • Code: response = req(params['url'], **request_args)

Marvilius avatar Jun 06 '22 14:06 Marvilius

Sometimes wikidata is very slow, especially when the keyword you searched for doesn't exist in the database.

unixfox avatar Jun 06 '22 15:06 unixfox

Error: httpx.TimeoutException

If you have problems to get result from wikidata based on timeout, try to increase the timeout of this engine:

https://github.com/searxng/searxng/blob/ea0cddba0b9366326eaebc0a3a55502a17e5baf4/searx/settings.yml#L476

return42 avatar Jun 07 '22 09:06 return42

I don't think it's a good solution, sometimes wikidata just timeout to a too high value like 10 seconds.

unixfox avatar Jun 07 '22 09:06 unixfox

I don't think it's a good solution, ..

Fully agree, but to handle timeout issues it is the only solution we can give at hand (AFAIK): It's always a trade-off; either short response times or longer timeouts.

If someone has different or better suggestions, please ask for reopen this issue.

return42 avatar Jun 07 '22 09:06 return42

I think wikidata should be disabled by default if it is not always usable.

unixfox avatar Jun 07 '22 10:06 unixfox

I think wikidata should be disabled by default if it is not always usable.

I don't see an issue from user's point of view .. in the default setup of SearXNG the wikidata engine is in category general .. it has a timeout of 3 sec. and there are a lot of other engines in this category. With this settings, we offer a short response time in this category / I think this is a good trade-off between:

"short response time" vs. "long timeout".

--> If there are no results from wikidata in the timeout of 3sec, the result-list that is returned to the SearXNG-user is filled up by results from other engines in the category general (without report an issue to the user) / e.g. compare foo !wikidata !ddg and foo !wikidata

In short: I do not see the need of disabling the engine / or do I oversee something?

return42 avatar Jun 07 '22 10:06 return42

3 seconds is not short, IMO it's quite long

Most of the instances on https://searx.space respond under 1 second.

unixfox avatar Jun 07 '22 12:06 unixfox

Possible solution:

  • disable wikidata (actually it would help the wikidata servers which are already hammered with a lot SPARQL queries).
  • enable duckduckgo_definition.

dalf avatar Jun 07 '22 12:06 dalf

I like to have the image from wikidata in the infobox on the right side of the result list; compare !ddd paris !wikidata and !ddd paris. But your argument ...

actually it would help the wikidata servers which are already hammered with a lot SPARQL queries

counts more I think :+1:

return42 avatar Jun 07 '22 12:06 return42

Is there a check if the page exist before trying to fetch the data? I feel like this timeout only occurs when the page doesn't exist for what the user searched for.

unixfox avatar Jun 08 '22 12:06 unixfox

Is there a check if the page exist before trying to fetch the data?

TLDR: no, but * may be * it is possible to optimize the response time when the page does not exist

https://github.com/searxng/searxng/blob/59ef9b9287f1beda12f7b9a20b93cbc378a22bac/searx/engines/wikidata.py#L57-L79

Note: %QUERY%, %LANGUAGE% and %WHERE% are replaced before the request is sent.

We don't know how the query is processed: * IF * the server decides to extract the graph first and then call Mediawiki REST API, then it extracts the graph for everything in Wikidata which timeout...

A way to fix that is to add hint:Prior hint:runFirst "true". just after SERVICE ...{ ... }

   SERVICE wikibase:mwapi { 
         bd:serviceParam wikibase:endpoint "www.wikidata.org"; 
         wikibase:api "EntitySearch"; 
         wikibase:limit 1; 
         mwapi:search "%QUERY%"; 
         mwapi:language "%LANGUAGE%". 
         ?item wikibase:apiOutputItem mwapi:item. 
   } 
   hint:Prior hint:runFirst "true".

I think we can try that (whatever we decide about the engine).

I like to have the image from wikidata in the infobox on the right side of the result list

Another example : the wikidata engine returns the source repository for a lot of projects.


I suggest this:

  • try the fix describe in the details above
  • if it does not work, enable by default ddd, disable by default wikidata.
  • we can create an infobox category for the engines, so the users can easily see the alternatives to ddd.

dalf avatar Jun 10 '22 14:06 dalf

Good idea to try the fix first.

unixfox avatar Jun 11 '22 15:06 unixfox

Is it better ?

On paulgo.io and searx.be, the P80 response time is 0.9.

dalf avatar Aug 27 '22 11:08 dalf

@unixfox can we close this issue?

On my instance wikidata engine has a response time of 0.7 sec ..

grafik

return42 avatar Apr 25 '23 15:04 return42