newspaper4k icon indicating copy to clipboard operation
newspaper4k copied to clipboard

Parsing issue, getting contents of select box

Open AndyTheFactory opened this issue 2 years ago • 0 comments

Issue by daltonch Thu Mar 9 20:04:53 2017 Originally opened as https://github.com/codelucas/newspaper/issues/344


from newspaper import Article
url = "http://www.refworld.org/docid/58b03ed44.html"
article = Article(url)
article.download()
article.parse()
article.text

I get the following text,

"Search Refworld\n\nand / or country All countries Afghanistan Albania Algeria American Samoa Andorra Angola Anguilla Antigua and Barbuda Argentina Armenia Aruba Australia Austria Azerbaijan Bahamas Bahrain Bangladesh Barbados Belarus Belgium Belize Benin Bermuda Bhutan Bolivia Bosnia and Herzegovina Botswana Brazil British Virgin Islands Brunei Darussalam Bulgaria Burkina Faso Burundi Cambodia Cameroon Canada Cape Verde Cayman Islands Central African Republic Chad Chile China Cocos (Keeling) Islands Colombia Comoros Congo, Democratic Republic of the Congo,...."

instead of the Actual Article Text. It appears to pull the contents of a select box w/ all the options. The page even has an

tag around the actual body of the article.

AndyTheFactory avatar Oct 24 '23 10:10 AndyTheFactory