Wikipedia-API icon indicating copy to clipboard operation
Wikipedia-API copied to clipboard

Newline / Space missing from .summary attribute

Open gruffaren opened this issue 3 years ago • 0 comments

The .summary attribute of a page does not include a newline or space after a sentence that ends in hard brackets [ ] on the Wikipedia page.

Example:

wiki = wiki_api.Wikipedia(language="en") query = "planet" page = wiki.page(query) text = page.summary print(text[:400])

which queries the article: https://en.wikipedia.org/wiki/Planet and returns: A planet is an astronomical body orbiting a star or stellar remnant that is massive enough to be rounded by its own gravity, is not massive enough to cause thermonuclear fusion, and – according to the International Astronomical Union but not all planetary scientists – has cleared its neighbouring region of planetesimals.The term planet is ancient, with ties to history, astrology, science, mytholog

Observe the lack of space between planetesimals. and The at the first paragraph, which ends with "planetesimals.[b][1][2]" on the web-page. Whilst later in the summary, at print(text[1200:1500]) There is a space between "discovered)." and "Ptolemy" as expected: the scientific community are no longer viewed as such under the current definition. Some of the excluded objects include Ceres, Pallas, Juno, Vesta (all of which are objects in the solar asteroid belt), and Pluto (the first trans-Neptunian object discovered). Ptolemy thought that the planets orbite

Please let me know if any additional information is needed to fix this, or if there is a workaround.

gruffaren avatar Dec 08 '21 22:12 gruffaren