abbreviation-extraction
abbreviation-extraction copied to clipboard
Python3 implementation of the Schwartz-Hearst algorithm for extracting abbreviation-definition pairs
- See https://github.com/philgooch/abbreviation-extraction/issues/25 - Reduces false positives by introducing short-form candidate casing and length constraint, and rewinding if candidate definition starts with an English preposition
https://github.com/philgooch/abbreviation-extraction/blob/2e334bbe474a4030c07860839c023775bb97c4ae/abbreviations/schwartz_hearst.py#L112-L140 https://github.com/philgooch/abbreviation-extraction/blob/2e334bbe474a4030c07860839c023775bb97c4ae/abbreviations/schwartz_hearst.py#L129 The above condition is unused, the condition being true or false has no effect on the variable "viable"
Hi Phil, Thanks so much for this clean and easy to use implementation! I noticed a couple of minor false positives when running it through a long document about space...
data:image/s3,"s3://crabby-images/f3ebc/f3ebc773c00755cc6bfbac49b516be96f1e534e8" alt="image" data:image/s3,"s3://crabby-images/5f012/5f012a625c99702c46dad5298334442919f5094d" alt="image"
Abbreviation expansions of the form `This Is A Term (TIAT)` are not always present in a document. You may also see glossary lists such as `TIAT This Is A Term`...
function documentation, specifying input (one sentence per line)
documentation about file format