yargy
yargy copied to clipboard
Rule-based facts extraction for Russian language
Yargy is an Earley parser similar to Tomita parser. Yargy uses rules and dictionaries to extract structured information from Russian texts.
Install
Yargy supports Python 3.5+, PyPy 3, depends only on Pymorphy2.
$ pip install yargy
Usage
from yargy import Parser, rule, and_, not_
from yargy.interpretation import fact
from yargy.predicates import gram
from yargy.relations import gnc_relation
from yargy.pipelines import morph_pipeline
Name = fact(
'Name',
['first', 'last'],
)
Person = fact(
'Person',
['position', 'name']
)
LAST = and_(
gram('Surn'),
not_(gram('Abbr')),
)
FIRST = and_(
gram('Name'),
not_(gram('Abbr')),
)
POSITION = morph_pipeline([
'управляющий директор',
'вице-мэр'
])
gnc = gnc_relation()
NAME = rule(
FIRST.interpretation(
Name.first
).match(gnc),
LAST.interpretation(
Name.last
).match(gnc)
).interpretation(
Name
)
PERSON = rule(
POSITION.interpretation(
Person.position
).match(gnc),
NAME.interpretation(
Person.name
)
).interpretation(
Person
)
parser = Parser(PERSON)
match = parser.match('управляющий директор Иван Ульянов')
print(match)
Person(
position='управляющий директор',
name=Name(
first='Иван',
last='Ульянов'
)
)
Documentation
All materials are in Russian:
Support
- Chat — https://telegram.me/natural_language_processing
- Issues — https://github.com/natasha/yargy/issues
- Commercial support — https://lab.alexkuk.ru
Development
Test:
make test
Package:
make version
git push
git push --tags
make clean wheel upload