MIMO_CFE icon indicating copy to clipboard operation
MIMO_CFE copied to clipboard

Source code for the EMNLP 2019 paper "Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text" (给定科研文本如生物医药文献,联合抽取其中事实...

Joint Extraction of Fact and Condition Tuples from Sceintific Text

Introduction

This repository contains source code for the EMNLP 2019 paper " "Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text" (Paper).

Usage

1.Clone the Repository

git clone https://github.com/twjiang/MIMO_CFE.git

2.Download External Resources

  • The dumped MIMO can be found here.

  • The word embedding we use can be found here.

  • The pre-trained language model we use can be found here.

put these files into ./resources folder

3.Install Requirements

This repo is tested on Python 3.6, PyTorch 1.2.0

Create Environment (Optional): Ideally, you should create an environment for the project.

conda create -n mimo python=3.6

conda activate mimo

pip install -r requirments.txt

4.Start a demo application

cd MIMO_service

python mimo_server.py #Start a MIMO service

python client.py 

The output of the demo is shown below.

{
	'statements': {
		'stmt 1': {
			'text': 'Histone deacetylase inhibitor valproic acid ( VPA ) has been used to increase the reprogramming efficiency of induced pluripotent stem cell ( iPSC ) from somatic cells , yet the specific molecular mechanisms underlying this effect is unknown .',
			'fact tuples': [
				['Histone deacetylase inhibitor valproic acid', 'NIL', 'has been used to increase', 'induced pluripotent stem cell', 'reprogramming efficiency'],
				['VPA', 'NIL', 'has been used to increase', 'induced pluripotent stem cell', 'reprogramming efficiency'],
				['Histone deacetylase inhibitor valproic acid', 'NIL', 'has been used to increase', 'induced pluripotent stem cell', 'reprogramming'],
				['specific molecular mechanisms', 'NIL', 'is unknown', 'NIL', 'NIL']
			],
			'condition tuples': [
				['iPSC', 'reprogramming efficiency', 'from', 'somatic cells', 'NIL'],
				['induced pluripotent stem cell', 'reprogramming efficiency', 'from', 'somatic cells', 'NIL'],
				['specific molecular mechanisms', 'NIL', 'underlying', 'NIL', 'effect']
			],
			'concept_indx': [0, 1, 2, 3, 4, 6, 17, 18, 19, 20, 22, 25, 26, 30, 31, 32],
			'attr_indx': [14, 15, 35],
			'predicate_indx': [8, 9, 10, 11, 12, 24, 33, 36, 37]
		}
	}
}

5. Train Your Own MIMO

example commands for pretrain:

(all gates for LM, pretrain)

python train.py --cuda --config 111000000 --model_name MIMO_BERT_LSTM --pretrain

(all gates for POS, pretrain)

python train.py --cuda --config 000111000 --model_name MIMO_BERT_LSTM --pretrain

(all gates for LM and POS, pretrain)

python train.py --cuda --config 111111000 --model_name MIMO_BERT_LSTM --pretrain

example commands with multi-output:

(all gates for LM with multi-output)

python train.py --cuda --config 111000000 --model_name MIMO_BERT_LSTM

(all gates for POS with multi-output)

python train.py --cuda --config 000111000 --model_name MIMO_BERT_LSTM

(all gates for LM and POS, with multi-output)

python train.py --cuda --config 111111000 --model_name MIMO_BERT_LSTM

Reference

@inproceedings{jiang-mimo,
    title = "Multi-Input Multi-Output Sequence Labeling for Joint Extraction of Fact and Condition Tuples from Scientific Text",
    author = "Jiang, Tianwen and Zhao, Tong and Qin, Bing and Liu, Ting and Chawla, Nitesh V and Jiang, Meng",
    booktitle = "Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing",
    month = nov,
    year = "2019",
    address = "Hong Kong, China",
    publisher = "Association for Computational Linguistics",
}