extraction-framework
extraction-framework copied to clipboard
extensions of mvn test in dump
Some loose notes on what to integrate there:
LinkExtractor has a TODO buggy, because of:
<a href="/wiki/Help:IPA/English" title="Help:IPA/English">/<span style="border-bottom:1px dotted"><span title="/ˈ/: primary stress follows">ˈ</span><span title="/ʃ/: 'sh' in 'shy'">ʃ</span><span title="/oʊ/: 'o' in 'code'">oʊ</span><span title="'p' in 'pie'">p</span><span title="/ən/: 'on' in 'button'">ən</span><span title="'h' in 'hi'">h</span><span title="/aʊ/: 'ou' in 'mouth'">aʊ</span><span title="/./: syllable break">.</span><span title="/ər/: 'er' in 'letter'">ər</span></span>/</a>
produces (; in short/long abstracts
#2 Johannes said that live was not producing: