searxng
searxng copied to clipboard
Bug: wikidata engine
Version of SearXNG, commit number if you are using on master branch and stipulate if you forked SearXNG Repository: https://github.com/searxng/searxng Branch: master Version: 2022.06.06-ea0cddba
How did you install SearXNG? docker container
What happened?
After every searx How To Reproduce
Immanent Expected behavior
Wiki entries shown Screenshots & Logs
Additional context
Technical report
Error
- Error: httpx.TimeoutException
- Percentage: 100
- Parameters:
(None, None, None) - File name:
searx/search/processors/online.py:96 - Function:
_send_http_request - Code:
response = req(params['url'], **request_args)
Sometimes wikidata is very slow, especially when the keyword you searched for doesn't exist in the database.
Error: httpx.TimeoutException
If you have problems to get result from wikidata based on timeout, try to increase the timeout of this engine:
https://github.com/searxng/searxng/blob/ea0cddba0b9366326eaebc0a3a55502a17e5baf4/searx/settings.yml#L476
I don't think it's a good solution, sometimes wikidata just timeout to a too high value like 10 seconds.
I don't think it's a good solution, ..
Fully agree, but to handle timeout issues it is the only solution we can give at hand (AFAIK): It's always a trade-off; either short response times or longer timeouts.
If someone has different or better suggestions, please ask for reopen this issue.
I think wikidata should be disabled by default if it is not always usable.
I think wikidata should be disabled by default if it is not always usable.
I don't see an issue from user's point of view .. in the default setup of SearXNG the wikidata engine is in category general .. it has a timeout of 3 sec. and there are a lot of other engines in this category. With this settings, we offer a short response time in this category / I think this is a good trade-off between:
"short response time" vs. "long timeout".
--> If there are no results from wikidata in the timeout of 3sec, the result-list that is returned to the SearXNG-user is filled up by results from other engines in the category general (without report an issue to the user) / e.g. compare foo !wikidata !ddg and foo !wikidata
In short: I do not see the need of disabling the engine / or do I oversee something?
3 seconds is not short, IMO it's quite long
Most of the instances on https://searx.space respond under 1 second.
Possible solution:
- disable wikidata (actually it would help the wikidata servers which are already hammered with a lot SPARQL queries).
- enable duckduckgo_definition.
I like to have the image from wikidata in the infobox on the right side of the result list; compare !ddd paris !wikidata and !ddd paris. But your argument ...
actually it would help the wikidata servers which are already hammered with a lot SPARQL queries
counts more I think :+1:
Is there a check if the page exist before trying to fetch the data? I feel like this timeout only occurs when the page doesn't exist for what the user searched for.
Is there a check if the page exist before trying to fetch the data?
TLDR: no, but * may be * it is possible to optimize the response time when the page does not exist
https://github.com/searxng/searxng/blob/59ef9b9287f1beda12f7b9a20b93cbc378a22bac/searx/engines/wikidata.py#L57-L79
Note: %QUERY%, %LANGUAGE% and %WHERE% are replaced before the request is sent.
SERVICE wikibase:mwapi {...}is way to send a request the Mediawiki REST API of Wikidata.%WHERE%is replaced by the graph to retrieve.
We don't know how the query is processed: * IF * the server decides to extract the graph first and then call Mediawiki REST API, then it extracts the graph for everything in Wikidata which timeout...
A way to fix that is to add hint:Prior hint:runFirst "true". just after SERVICE ...{ ... }
SERVICE wikibase:mwapi {
bd:serviceParam wikibase:endpoint "www.wikidata.org";
wikibase:api "EntitySearch";
wikibase:limit 1;
mwapi:search "%QUERY%";
mwapi:language "%LANGUAGE%".
?item wikibase:apiOutputItem mwapi:item.
}
hint:Prior hint:runFirst "true".
I think we can try that (whatever we decide about the engine).
I like to have the image from wikidata in the infobox on the right side of the result list
Another example : the wikidata engine returns the source repository for a lot of projects.
I suggest this:
- try the fix describe in the details above
- if it does not work, enable by default ddd, disable by default wikidata.
- we can create an infobox category for the engines, so the users can easily see the alternatives to ddd.
Good idea to try the fix first.
Is it better ?
On paulgo.io and searx.be, the P80 response time is 0.9.
@unixfox can we close this issue?
On my instance wikidata engine has a response time of 0.7 sec ..
