Wikipedia-API icon indicating copy to clipboard operation
Wikipedia-API copied to clipboard

Print summary until end of paragraph

Open mrclean789 opened this issue 2 years ago • 5 comments

Is there a better way to get page summary so that it doesn't cut off? For e.g., start to end of first paragraph, or first two paragraphs.

mrclean789 avatar Sep 01 '22 16:09 mrclean789

You will have to use nltk package (try tokenization function) or any other natural language processing library for it

L1mak avatar Nov 15 '22 19:11 L1mak

@mrclean789 : Do you have some example when this is happening?

Based on the API, there could be some limitation - https://www.mediawiki.org/w/api.php?action=help&modules=query%2Bextracts

martin-majlis avatar Dec 15 '22 11:12 martin-majlis

When I try the underlying API call - https://en.wikipedia.org/w/api.php?action=query&explaintext=1&exsectionformat=wiki&prop=extracts&titles=Planet& - it looks to me, that there is no restriction on the length of the response.

martin-majlis avatar Dec 15 '22 11:12 martin-majlis

With HTML format, the summary encloses the paragraphs correctly into <p></p>. Using the WIKI format the next paragraph is concatenated without space or newline, like: sentence1p1. sentence2p1.sentence1p2

psmatter avatar Dec 01 '23 10:12 psmatter