atarashi
atarashi copied to clipboard
Feat(models): Implemented three models for license similarity
Description
Implementation of Logistic Regression, Multinomial Naive Bayes and Linear SVC on license dataset licenseList.csv. The main purpose of implementing this idea was to plan for a model which can make atarashi faster and more accurate.
Files
- train.py (Training the models and saving in binary)
- test.py ( For the testing purpose)
- lr_model.pkl (Binary file for logistic regression)
- nb_model.pkl(Binary file for Multinomial Naive Bayes)
- svc_model.pkl(Binary file for Linear SVC)
- vectorizer.pkl (Binary file for storing vocabulary)
How to use?
-
Test the models
-
atarashi -a lr_classifier path/to/file
(Logistic Regression) -
atarashi -a nb_classifier path/to/file
(Multinomial Naive Bayes) -
atarashi -a svc_classifier path/to/file
(Linear SVC)
-
-
Train the models (Optional)
- From the base folder run :
python3 atarashi/agents/models/train.py
- From the base folder run :
ToDo
-
[x] Test working and accuracy of the algorithms using
evaluator.py
-
[x] proper integration with
atarashii.py
Accuracy Score
Model Name | Accuracy Score in % | Time taken on 100 files in (sec) |
---|---|---|
Logistic Regression | 31 | 88.6 |
Linear SVC | 36 | 79.4 |
Multinomial Naive Bayes | 30 | 83.72 |
Future Scope
- The well-defined dataset will increase the similarity accuracy even more. By well-defined dataset I mean with newly updated licenses also ( 1 class to n License) style license file will do the work.
CC: @hastagAB @GMishx @ag4ums
Signed off by: Kaushlendra Pratap Singh [email protected]
@hastagAB @GMishx , I implemented the models command into atarashii.py
but it seems like I am missing something to update somewhere in code.
@GMishx @ag4ums I have run all three models on the Test files and I am attaching the screenshot of the results.