gramex Add code for sentiment analysis

Removed class NLPHandler() and added sentiment analysis functionality in class MLHandler().

To setup a Gramex service for performing sentiment analysis, use the following configuration:

url:
  sentiment-analysis:
    pattern: /$YAMLURL/
    handler: MLHandler
    kwargs:
      backend: transformers
      task: sentiment-analysis
      xsrf_cookies: false

Getting predictions

GET sentiments of short pieces of text as follows:

curl -X GET --data-urlencode "text=This movie is so bad, it's good." http://localhost:9988/

The output will be:

[
  {
    "label": "POSITIVE",
    "score": 0.9997316002845764
  }
]

Files containing text to be classified can also be POSTed to the endpoint, with _action=predict. Any file supported by gramex.cache.open will work. (Download a sample here.)

curl -X POST -F "[email protected]" http://localhost:9988/?_action=predict

The output will be:

[
  {
    "label": "POSITIVE",
    "score": 0.9997316002845764
  },
  {
    "label": "NEGATIVE",
    "score": 0.9974692463874817
  },
  // etc.
]

Measuring model performance

Files containing the text and label fields can be POSTED to the endpoint with _action=score to get the ROC AUC score of the model against the dataset. (Download a sample dataset here).

curl -X POST -F "[email protected]" http://localhost:9988/?_action=score

The output will be something like:

{
  "roc_auc": 0.9929
}

Training the model

The model can be trained on a dataset by setting _action=train, and POSTing the file.

curl -X POST -F "file=@sentiment_score.json" http://localhost:9988/?_action=train

The output will show the score of the trained model on the dataset:

{
  "roc_auc": 0.8
}

Multiple training options for the transformer are supported, including the number of epochs, batch size and weight decay. These can all be specified in the POST request as follows:

# Train for three epochs instead of the default 1
curl -X POST -F "[email protected]" http://localhost:9988/?_action=train&num_train_epochs=3

The output is the score of the trained model on the dataset after 3 epochs:

{
  "roc_auc": 0.98
}

# Change the batch size to 32 instead of the default 16
curl -X POST -F "[email protected]" \
	http://localhost:9988/?_action=train&per_device_train_batch_size=32&num_train_epochs=3

The output is the score of the trained model on the dataset after 3 epochs and a batch size of 32:

{
  "roc_auc": 0.99
}

Jun 21 '21 14:06 sanketverma1704

Cool! @jaidevd could you please review? Do let me know when to merge

Jun 21 '21 15:06 sanand0

@MSanKeys963 The target branch has to be gramener/gramex's master branch, not the jd-transformers branch.

Jun 26 '21 08:06 jaidevd

@MSanKeys963 other than these two changes, LGTM

Jun 26 '21 09:06 jaidevd

@MSanKeys963 this still showing merge conflicts. Please take a look.

Jul 07 '21 08:07 jaidevd

@jaidevd I've fixed all the issues mentioned above. Please let me know if there's anything else.

Jul 16 '21 20:07 sanketverma1704

Thanks, @MSanKeys963

@sanand0 This is ready for merge.

Jul 19 '21 15:07 jaidevd

@sanand0 I've fixed all the issues. Please check.

Jul 23 '21 10:07 sanketverma1704

@MSanKeys963

[ ] Can you get this to work, please? sentiment.zip
[ ] Gramex should still run if PyTorch & Huggingface are not installed

For example, this is how we optionally import ElasticSearch:

def gramexlog(conf):
    try:
        from elasticsearch import Elasticsearch, helpers
    except ImportError:
        app_log.error('gramexlog: elasticsearch missing. pip install elasticsearch')
        return

Jul 30 '21 04:07 sanand0