Bookmark-Manager
Bookmark-Manager copied to clipboard
NLP based approach to automatically categorize your bookmarks!
Bookmark-Manager
NLP based approach to automatically categorize your bookmarks!
In-Depth
To understand this project in-depth, refer to my technical paper: Bookmark Classification using Multinomial Naive Bayes Model
How does this work ?
- Enter your bookmarks in
./links.json
file - To run the code, run
categorize.py
-
scrape_filter_link.py
contains the classes used to scrape information from each URL
What bookmarks is it categorizing ?
It can categorize a variety of bookmarks. Currently it supports all the categories mentioned in the ./corpus/
directory.
Will I have to enter my bookmark links manually ?
To a certain extent! For example: Firefox allows users to backup the bookmarks in a JSON format. You can extract the uri
from that JSON file and feed it into ./links.json
.
To backup your bookmarks in Firefox, press Ctrl+Shift+O, go to Import and Backup
and then to Backup
.
Chrome users can check this post on superuser.
Will the code create a directory structure with my bookmarks ?
No, the mapping of a URL with it's appropriate category is stored in a JSON file: result.json
, in a dict format.
The keys are your bookmarks with values being their categories.
Can I see a demo ?
Sure, here's one (The highlighted part is the one stored in result.json
):
Can I improve the corpus, by adding more categories in ./corpus/
directory ?
Yes, you can! The code is fairly scalable.
To add your own corpuses:
- Create a directory with a unique category name in
./corpus/
- Inside the
./corpus/your-category-dir
add your corpus text in a JSON file with the format:{"text": "_your_corpus_text_here_"}
(NOTE: You can add multiple JSON files in a category directory)
When you run the code, you will find that the categorize.py
will take the new/modified corpuses into consideration.
License
The code is under MIT License