wikipedia-frequency-lookup icon indicating copy to clipboard operation
wikipedia-frequency-lookup copied to clipboard

Simple script written in Python to get the 20 words with highest frequency in an English Wikipedia article

Wikipedia Frequency Lookup

Simple script written in Python to get the 20 words and their frequency percentage with highest frequency in an English Wikipedia article. You enter your string and using Wikipedia Search API, you get the top 20 words

Built this, so that I could implement my basic learning somewhere and play around with some libraries :books: . If you want to remove the stop words (such as "and", "the", "a", "an", and similar words) from frequency table, simply add a yes after your string.

Instructions to run

  • Clone project
git clone https://github.com/prabhakar267/wikipedia-frequency-lookup.git
cd wikipedia-frequency-lookup
  • Add virtual environment
pip install virtualenv
virtualenv venv
source venv/bin/activate
  • Install dependencies
[sudo] pip install -r requirements.txt
  • Run script
  python main.py <your-string> [yes]

screenshot

screenshot