Handling of suggestions could be improved
This line in the page function:
title = suggestion or results[0]
does not seem correct. It appears that it should at least check if results[0] exactly matches the title that the user provided. In other words something like:
title = title if title == results[0] else suggestion
For a concrete example, try searching on "Steven Pinker" with auto_suggest on. The suggestion will be "Stephen Pinker," which is incorrect and since the suggestion wins in the current implementation, you get an exception rather than finding the page whose title you entered correctly.
This modified version of the wikipedia.page() function seems to produce decent results (note also the change to 4-space indentation):
def ap_page(title=None, pageid=None, auto_suggest=True, redirect=True, preload=False):
# Copied from wikipedia.py, modified to fix poor search results.
# See https://github.com/goldsmith/Wikipedia/issues/227.
'''
Get a WikipediaPage object for the page with title `title` or the pageid
`pageid` (mutually exclusive).
Keyword arguments:
* title - the title of the page to load
* pageid - the numeric pageid of the page to load
* auto_suggest - let Wikipedia find a valid page title for the query
* redirect - allow redirection without raising RedirectError
* preload - load content, summary, images, references, and links during initialization
'''
if not title and not pageid:
raise ValueError("Either a title or a pageid must be specified")
if pageid:
return wikipedia.WikipediaPage(pageid=pageid, preload=preload)
if title:
if auto_suggest:
results, suggestion = wikipedia.search(title, results=1, suggestion=True)
if results:
if len(results) == 1:
# One auto-suggested result: use it.
title = results[0]
elif title == results[0]:
# Multiple suggestions but first is exact title match: use it.
title = title
elif suggestion:
title = suggestion
else:
# No results and no suggestion.
raise wikipedia.PageError(title)
return wikipedia.WikipediaPage(title, redirect=redirect, preload=preload)
On the other hand, this library seems to work well as an alternative: https://github.com/barrust/mediawiki