Data-Science-For-Beginners icon indicating copy to clipboard operation
Data-Science-For-Beginners copied to clipboard

Irrelevant content getting scrapped

Open kushagrasharma-13 opened this issue 11 months ago • 0 comments

The web content that is being scrapped from the url provided in the "01-defining-data-science" is extracting irrelevant information like navigation, random articles and refrences and causes errors in getting insights and forming wordcloud

A clear and concise description of what you want to happen. I would like to form a solution that takes only the necessary and relevant content for further processing

We can use BeautifulSoup instead of HTMLParser and utilize its features to extract only the relevant content

Irrelevant Content: irrelevant Relevant Content relevant

kushagrasharma-13 avatar Mar 03 '24 14:03 kushagrasharma-13