Wikipedia icon indicating copy to clipboard operation
Wikipedia copied to clipboard

Disambiguation error gives titles that leads to the same error

Open Pejosonic opened this issue 4 years ago • 2 comments

Hi, this example happened working with wikipedia lang 'es'. I'm trying to get wikipedia.page('Alfa Romeo Giulia') and getting Disambiguation Error with this options:

['Alfa Romeo Giulia', 
'Alfa Romeo Giulia GT Veloce', 
'Alfa Romeo Giulia TZ', 
'Alfa Romeo Giulia']

The first and last options lead me to the same error. I cannot get the actual URLs from the error arguments.

For this case they would be:

https://es.wikipedia.org/wiki/Alfa_Romeo_Giulia_(1962)
https://es.wikipedia.org/wiki/Alfa_Romeo_Giulia_GT_Veloce
https://es.wikipedia.org/wiki/Alfa_Romeo_Giulia_TZ
https://es.wikipedia.org/wiki/Alfa_Romeo_Giulia_(2015)

Thanks!

Pejosonic avatar Sep 16 '20 13:09 Pejosonic

I have the same problem with wikipedia.summary. This happend with lang 'de'.

import wikipedia
wikipedia.set_lang("de")

try:
    summary: str = wikipedia.summary("Schlacht von Pjöngjang")
except wikipedia.exceptions.DisambiguationError as e:
    new_query = e.options[-1:][0] #select the last suggestion
    summary: str = wikipedia.summary(new_query)

Yields another wikipedia.exceptions.DisambiguationError.

SchulerSimon avatar Mar 16 '21 13:03 SchulerSimon

There is a quick fix that I tested with french, that seemed to work great. The main problem is that in the handling of the disambiguation, the code returns the HTML text in that list rather than the title which corresponds to the correct title of the page.

i.e.: Émancipation returns as its last element in the disambiguation list: "Emancipation". Which is bad because it is the same has the first search, but the title is: "Emancipation (Stargate)". A search with the title yielded the correct page.

So I updated this line on my wikipedia.py file from this: may_refer_to = [li.a.get_text() for li in filtered_lis if li.a]

to this: may_refer_to = [li.a.get('title') for li in filtered_lis if li.a]

So far it has worked great in my limited testing in french. There is always the posibility to return the href of each instead of the title to be able to call the page directly with a GET request.

LaZoRBear avatar Jun 10 '21 13:06 LaZoRBear