Wikipedia-API
Wikipedia-API copied to clipboard
Print summary until end of paragraph
Is there a better way to get page summary so that it doesn't cut off? For e.g., start to end of first paragraph, or first two paragraphs.
You will have to use nltk package (try tokenization function) or any other natural language processing library for it
@mrclean789 : Do you have some example when this is happening?
Based on the API, there could be some limitation - https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bextracts
When I try the underlying API call - https://en.wikipedia.org/w/api.php?action=query&explaintext=1&exsectionformat=wiki&prop=extracts&titles=Planet& - it looks to me, that there is no restriction on the length of the response.
With HTML format, the summary encloses the paragraphs correctly into <p></p>. Using the WIKI format the next paragraph is concatenated without space or newline, like: sentence1p1. sentence2p1.sentence1p2