Add code for sentiment analysis
Removed class NLPHandler() and added sentiment analysis functionality in class MLHandler().
To setup a Gramex service for performing sentiment analysis, use the following configuration:
url:
sentiment-analysis:
pattern: /$YAMLURL/
handler: MLHandler
kwargs:
backend: transformers
task: sentiment-analysis
xsrf_cookies: false
Getting predictions
GET sentiments of short pieces of text as follows:
curl -X GET --data-urlencode "text=This movie is so bad, it's good." http://localhost:9988/
The output will be:
[
{
"label": "POSITIVE",
"score": 0.9997316002845764
}
]
Files containing text to be classified can also be POSTed to the endpoint, with _action=predict. Any file supported by gramex.cache.open will work. (Download a sample here.)
curl -X POST -F "[email protected]" http://localhost:9988/?_action=predict
The output will be:
[
{
"label": "POSITIVE",
"score": 0.9997316002845764
},
{
"label": "NEGATIVE",
"score": 0.9974692463874817
},
// etc.
]
Measuring model performance
Files containing the text and label fields can be POSTED to the endpoint
with _action=score to get the ROC AUC score of the model against the dataset. (Download a sample dataset here).
curl -X POST -F "[email protected]" http://localhost:9988/?_action=score
The output will be something like:
{
"roc_auc": 0.9929
}
Training the model
The model can be trained on a dataset by setting _action=train, and POSTing the file.
curl -X POST -F "file=@sentiment_score.json" http://localhost:9988/?_action=train
The output will show the score of the trained model on the dataset:
{
"roc_auc": 0.8
}
Multiple training options for the transformer are supported, including the number of epochs, batch size and weight decay. These can all be specified in the POST request as follows:
# Train for three epochs instead of the default 1
curl -X POST -F "[email protected]" http://localhost:9988/?_action=train&num_train_epochs=3
The output is the score of the trained model on the dataset after 3 epochs:
{
"roc_auc": 0.98
}
# Change the batch size to 32 instead of the default 16
curl -X POST -F "[email protected]" \
http://localhost:9988/?_action=train&per_device_train_batch_size=32&num_train_epochs=3
The output is the score of the trained model on the dataset after 3 epochs and a batch size of 32:
{
"roc_auc": 0.99
}
Cool! @jaidevd could you please review? Do let me know when to merge
@MSanKeys963 The target branch has to be gramener/gramex's master branch, not the jd-transformers branch.
@MSanKeys963 other than these two changes, LGTM
@MSanKeys963 this still showing merge conflicts. Please take a look.
@jaidevd I've fixed all the issues mentioned above. Please let me know if there's anything else.
Thanks, @MSanKeys963
@sanand0 This is ready for merge.
@sanand0 I've fixed all the issues. Please check.
@MSanKeys963
- [ ] Can you get this to work, please? sentiment.zip
- [ ] Gramex should still run if PyTorch & Huggingface are not installed
For example, this is how we optionally import ElasticSearch:
def gramexlog(conf):
try:
from elasticsearch import Elasticsearch, helpers
except ImportError:
app_log.error('gramexlog: elasticsearch missing. pip install elasticsearch')
return