abbreviation-extraction icon indicating copy to clipboard operation
abbreviation-extraction copied to clipboard

Python3 implementation of the Schwartz-Hearst algorithm for extracting abbreviation-definition pairs

Results 7 abbreviation-extraction issues
Sort by recently updated
recently updated
newest added

- See https://github.com/philgooch/abbreviation-extraction/issues/25 - Reduces false positives by introducing short-form candidate casing and length constraint, and rewinding if candidate definition starts with an English preposition

https://github.com/philgooch/abbreviation-extraction/blob/2e334bbe474a4030c07860839c023775bb97c4ae/abbreviations/schwartz_hearst.py#L112-L140 https://github.com/philgooch/abbreviation-extraction/blob/2e334bbe474a4030c07860839c023775bb97c4ae/abbreviations/schwartz_hearst.py#L129 The above condition is unused, the condition being true or false has no effect on the variable "viable"

Hi Phil, Thanks so much for this clean and easy to use implementation! I noticed a couple of minor false positives when running it through a long document about space...

![image](https://user-images.githubusercontent.com/27803744/84111262-2f858900-aa44-11ea-8288-dc78b95540ee.png) ![image](https://user-images.githubusercontent.com/27803744/84126913-2acccf00-aa5c-11ea-8ce4-4a0435d1065b.png)

Abbreviation expansions of the form `This Is A Term (TIAT)` are not always present in a document. You may also see glossary lists such as `TIAT This Is A Term`...

help wanted

function documentation, specifying input (one sentence per line)

documentation about file format