Causal_Relation_Extraction
Causal_Relation_Extraction copied to clipboard
Causal Relation Extraction and Identification using Conditional Random Fields
Causal Relation Extraction From Medical Texts
Causal Relation Extraction and Identification using Conditional Random Fields. It was a project under our faculty Mr. Tirthankar Dasgupta.
Link to the project presentation.
Introduction
Causal Relation is a relation between two events: cause and effect. Cause is the producer of the effect, and effect the result of the cause.
Ex. “Hunger is the most common cause of crying in a young baby.” Here cause is “Hunger” and effect is “Crying”. The present work is focused on the detection and extraction of Causal Relations from Medical domain text.
From the point of view of detecting Causal Relations, the following distinctions may be useful: • Marked or unmarked: a causation is marked if there is a specific linguistic unit that signals the relation; unmarked otherwise. “I bought it because I read a good review” is marked; “Be careful. It’s unstable” isn’t. • Ambiguity: if the mark signals always a causation, it is unambiguous (e.g. “because”). If it signals sometimes a causation, it is ambiguous (e.g. “since” ). • Explicit or implicit: a causation is explicit if both arguments are present; implicit if one or both are missing. “She was thrown out of the hotel after she had run naked through its halls.” is explicit; “John killed Bob.” is implicit, since the effect, Bob’s death, is not explicitly stated. We focus on marked and explicit causations.
Workflow
1. Data Preprocessing 2. Feature Selection and Extraction 3. Training Model 4. Testing Model Prediction Accuracy
Data Preprocessing
- Extracting unique words
- POS Tagging & Term Labelling (CC- cause, EE- effect, O- Null, RR- relation(Causal Link word) )
Code Snippet:-
Feature Selection and Extraction
- Word Case (upper/lower)
- Word POS
- Word title
- Type (Alphanumeric/Character)
Model Selection and Training
Statistical Model CRF (Conditional Random Field) is used from sklearn-crfsuite library. We trained model on our preprocessed training dataset.
Code Snippet:-
Model Testing
Testing model on test data with following Precession, Recall, & F-1 score values.
Code Snippet:-
The Results of Conditional Random Field:-
Future Scope
To get more accurate result we can use (Sequence Models) Deep Neural Networks, like Bidirectional LSTM Models. These models can be used owing to their high accuracy because of their very deep feature extraction capabilities. Only disadvantage is that they (LSTMs) require very large amount of data for training.
References
• University Of New Zealand • Wikipedia • Automatic Extraction of Causal Relations from Text using Linguistically Informed Deep Neural Networks
Author
|
Other Contributor
Special Thanks to Shivendra Pratap Singh for all his efforts and contributions.