webstruct
webstruct copied to clipboard
Add date features
I added the features I created for Fireflax
Codecov Report
Merging #58 into master will increase coverage by
0.13%
. The diff coverage is84.44%
.
@@ Coverage Diff @@
## master #58 +/- ##
==========================================
+ Coverage 81.01% 81.14% +0.13%
==========================================
Files 40 41 +1
Lines 2091 2180 +89
==========================================
+ Hits 1694 1769 +75
- Misses 397 411 +14
Uh? Is it complaining because I did not write tests for the new features?
I run some tests to check how much these features help identifying date objects and results were mixed:
- when start and end dates were identified by a single entity the extra features slightly worsened the performance moving the F1 score for B-date and I-date from 0.567 and 0.628 to 0.548 and 0.611 respectively. Sequence accuracy remains the same
- when start and end dates were identified in two separate entities the extra features slightly increased the performance. For B-END_DATE F1 score moved from 0.591 to 0.625, I-END_DATE went from 0.682 to 0.721, B-START_DATE went from 0.522 to 0.547 and I-START_DATE went from 0.667 to 0.690. sequence accuracy went from 1.5% to 3.1%
scores were evaluated cross validating (3 fold) on 45 labelled pages and using crf model