natasha/yargy: Rule-based facts extraction for Russian language

Yargy is an Earley parser similar to Tomita parser. Yargy uses rules and dictionaries to extract structured information from Russian texts.

Install

Yargy supports Python 3.5+, PyPy 3, depends only on Pymorphy2.

$ pip install yargy

Usage

from yargy import Parser, rule, and_, not_
from yargy.interpretation import fact
from yargy.predicates import gram
from yargy.relations import gnc_relation
from yargy.pipelines import morph_pipeline


Name = fact(
    'Name',
    ['first', 'last'],
)
Person = fact(
    'Person',
    ['position', 'name']
)

LAST = and_(
    gram('Surn'),
    not_(gram('Abbr')),
)
FIRST = and_(
    gram('Name'),
    not_(gram('Abbr')),
)

POSITION = morph_pipeline([
    'управляющий директор',
    'вице-мэр'
])

gnc = gnc_relation()
NAME = rule(
    FIRST.interpretation(
        Name.first
    ).match(gnc),
    LAST.interpretation(
        Name.last
    ).match(gnc)
).interpretation(
    Name
)

PERSON = rule(
    POSITION.interpretation(
        Person.position
    ).match(gnc),
    NAME.interpretation(
        Person.name
    )
).interpretation(
    Person
)

parser = Parser(PERSON)

match = parser.match('управляющий директор Иван Ульянов')
print(match)

Person(
    position='управляющий директор',
    name=Name(
        first='Иван',
        last='Ульянов'
    )
)

Documentation

All materials are in Russian:

Support

Chat — https://telegram.me/natural_language_processing
Issues — https://github.com/natasha/yargy/issues
Commercial support — https://lab.alexkuk.ru

Development

Test:

make test

Package:

make version
git push
git push --tags

make clean wheel upload

yargy
yargy copied to clipboard

Metadata

Install

Usage

Documentation

Support

Development

← Metadata

Owner

Metadata

yargy yargy copied to clipboard

Metadata

Install

Usage

Documentation

Support

Development

← Metadata

Owner

Metadata

yargy
yargy copied to clipboard