EntityCategoryPrediction icon indicating copy to clipboard operation
EntityCategoryPrediction copied to clipboard

Model for predicting categories of entities by its mentions

Category prediction model

This repo contains AllenNLP model for prediction of Named Entity categories by its mentions.

Data

Fake data

You can generate some fake data using this Notebook

Real data (Work in progress)

Filtered OneShotWikilinks dataset with manually selected categories.

Data preparation steps

  • Crete category graph build_category_graph.ipynb
    • Produces: category_graph.pkl
  • Obtain the list of Person articles from Ontology obtain_people_articles.ipynb:
    • Requires: dbpedia_2016-10.owl
    • Produces: people_categories.json
  • Build mapping from article to people categories generate_full_people_categories.ipynb. Requires
    • people_categories.json
    • category_graph.pkl
    • projects/categories_prediction/manual_categories.gsheet
  • Filter mentions for people filter_mentions.ipynb.
    • Requires: people_all_categories.json
    • Produces: people_mentions.tsv

Prepare splitted data with:

!split -n l/10 --verbose ../data/fake_data_train.tsv ../data/fake_data_train.tsv_

Install

pip install -r requirements.txt

Run

Train


rm -rf ./data/vocabulary ; allennlp make-vocab -s ./data/ allen_conf_vocab.json --include-package category_prediction

allennlp train -f -s data/stats allen_conf.json --include-package category_prediction
allennlp train -f -s data/stats allen_conf.json --include-package category_prediction -o '{"trainer": {"cuda_device": 0}}'

Continue training with different params

rm -rf data/stats2/  # Clear new serialization dir
allennlp fine-tune -s data/stats2/ -c allen_conf.json -m ./data/stats/model.tar.gz --include-package category_prediction -o '{"trainer": {"cuda_device": 0}, "iterator": {"base_iterator": {"batch_size": 64}}}'

Validate

allennlp evaluate ./data/stats/model.tar.gz ./data/fake_data_test.tsv --include-package category_prediction

Server

Debug

MODEL=./data/trained_models/6th_augmented/model.tar.gz python run_server.py

Prod

gunicorn -c gunicorn_config.py wsgi:application

Docker

Build

cd docker
docker build --tag mention .

Run with passing pyenv into container

docker run --rm --restart unless-stopped -v $HOME:$HOME -p 8000:8000 \
        -v $HOME/.pyenv:/root/.pyenv \ 
        -e ENV_PATH=$HOME/virtualenv/path \
        -e APP_PATH=$HOME/project/root/path mention

GCE related notes

Fix 100% GPU utilization

sudo nvidia-smi -pm 1