webscraping-and-analysis-of-medium-articles icon indicating copy to clipboard operation
webscraping-and-analysis-of-medium-articles copied to clipboard

Scraping medium articles tagged under ML,DL and AI and performing Analysis

Web Scraping and Analysis of Medium articles

Web scraping automatically extracts data and presents it in a format you can easily make sense of.We are going to use Python as our scraping language, together with a simple and powerful library, BeautifulSoup

For Dynamically loading the webpage we use Selenium along with chromedriver

Selenium WebDriver is a collection of open source APIs which are used to automate the testing of a web application. Description: Selenium WebDriver tool is used to automate web application testing to verify that it works as expected. It supports many browsers such as Firefox, Chrome, IE, and Safari.

Kaggle link : https://www.kaggle.com/sangarshanan/medium-articles-tagged-in-mldlai

Scraping Rules

  • You should check a website’s Terms and Conditions before you scrape it. Be careful to read the statements about legal use of data. Usually, the data you scrape should not be used for commercial purposes.
  • Do not request data from the website too aggressively with your program (also known as spamming), as this may break the website. Make sure your program behaves in a reasonable manner (i.e. acts like a human). One request for one webpage per second is good practice.
  • The layout of a website may change from time to time, so make sure to revisit the site and rewrite your code as needed

Exploratory analysis of the data has also been done

Analysis is done

  • Author wise
  • Month wise
  • Tag wise and so on...

The resulting visualizations help us understand data science based medium articles better...