Causal Relation Extraction From Medical Texts

Causal Relation Extraction and Identification using Conditional Random Fields. It was a project under our faculty Mr. Tirthankar Dasgupta.

Link to the project presentation.

Introduction

Causal Relation is a relation between two events: cause and effect. Cause is the producer of the effect, and effect the result of the cause.

Ex. “Hunger is the most common cause of crying in a young baby.” Here cause is “Hunger” and effect is “Crying”. The present work is focused on the detection and extraction of Causal Relations from Medical domain text.

From the point of view of detecting Causal Relations, the following distinctions may be useful: • Marked or unmarked: a causation is marked if there is a specific linguistic unit that signals the relation; unmarked otherwise. “I bought it because I read a good review” is marked; “Be careful. It’s unstable” isn’t. • Ambiguity: if the mark signals always a causation, it is unambiguous (e.g. “because”). If it signals sometimes a causation, it is ambiguous (e.g. “since” ). • Explicit or implicit: a causation is explicit if both arguments are present; implicit if one or both are missing. “She was thrown out of the hotel after she had run naked through its halls.” is explicit; “John killed Bob.” is implicit, since the effect, Bob’s death, is not explicitly stated. We focus on marked and explicit causations.

Workflow

1. Data Preprocessing 2. Feature Selection and Extraction 3. Training Model 4. Testing Model Prediction Accuracy

Data Preprocessing

Extracting unique words
POS Tagging & Term Labelling (CC- cause, EE- effect, O- Null, RR- relation(Causal Link word) )

Code Snippet:-

alt text

Feature Selection and Extraction

Word Case (upper/lower)
Word POS
Word title
Type (Alphanumeric/Character)

Model Selection and Training

Statistical Model CRF (Conditional Random Field) is used from sklearn-crfsuite library. We trained model on our preprocessed training dataset.

Code Snippet:-

alt text

Model Testing

Testing model on test data with following Precession, Recall, & F-1 score values.

Code Snippet:-

alt text

The Results of Conditional Random Field:-

alt text

Future Scope

To get more accurate result we can use (Sequence Models) Deep Neural Networks, like Bidirectional LSTM Models. These models can be used owing to their high accuracy because of their very deep feature extraction capabilities. Only disadvantage is that they (LSTMs) require very large amount of data for training.

References

• University Of New Zealand • Wikipedia • Automatic Extraction of Causal Relations from Text using Linguistically Informed Deep Neural Networks

Author

 Prateek Gupta

Other Contributor

Special Thanks to Shivendra Pratap Singh for all his efforts and contributions.

Causal_Relation_Extraction
Causal_Relation_Extraction copied to clipboard

Metadata

Causal Relation Extraction From Medical Texts

Introduction

Workflow

Data Preprocessing

Code Snippet:-

Feature Selection and Extraction

Model Selection and Training

Code Snippet:-

Model Testing

Code Snippet:-

The Results of Conditional Random Field:-

Future Scope

References

Author

Other Contributor

← Metadata

Owner

Metadata

Causal_Relation_Extraction Causal_Relation_Extraction copied to clipboard

Metadata

Causal Relation Extraction From Medical Texts

Introduction

Workflow

Data Preprocessing

Code Snippet:-

Feature Selection and Extraction

Model Selection and Training

Code Snippet:-

Model Testing

Code Snippet:-

The Results of Conditional Random Field:-

Future Scope

References

Author

Other Contributor

← Metadata

Owner

Metadata

Causal_Relation_Extraction
Causal_Relation_Extraction copied to clipboard