language-learning issues

Missing unit test for grammar-tester handling .db dictionaries

1

from @alexei-gl in https://github.com/singnet/language-learning/pull/243 : "please make unit test for .db dictionaries inn grammar-tester"

glicerico

test

Hybrid sequential-MST parser

9

Implement "hybrid" parser blending sequential information and MI, so the extend of blending could be made configurable, with "maximum sequential" mode producing "sequential parse" and "maximum MI" mode producing "plain...

akolonin

Fix GT MWC counter and re-evaluate grammars with MWC > 1

6

1. Fix the bug skipping unparsed words in test parses 2. Re-evaluate all parses in MWC-Study tab and update the links and numbers in the sheet (keep updating progress for...

akolonin

doing

Unsupervised Parser Challenge for Gutenberg Children corpus

The goal of the challenge is to have unsupervisedly trained parser to create parses approximating "expected" English parses to the best extent - using cleaned Gutenberg Children corpus data as...

akolonin

doing

Parse-evaluator "sequential" and "random" test-file bugs

When running the parse-evaluator in sequential or random mode, the parameter -t specifies where the sequential/random parses will be written. There is a bug and a theoretical problem with this:...

glicerico

bug

enhancement

Tokenization is different for LG English and LG ANY - which problems may be raised by this

6

Study why tokenization is different for LG English and LG ANY and which problems may be raised by this and how it could be solved. Examples from Andres - specifically...

akolonin

enhancement

Grammar Learner: Identical Lexical Entries (ILE) algorithm based on multi-germ/single-disjunct entries

3

**Problem:** Currently, Identical Lexical Entries (ILE) algorithm builds single-germ/multi-disjunct lexical entires (LE) first, and then aggregates identical ones based on unique combinations of disjuncts. That leads to fact that rarely...

akolonin

enhancement

Wrong tagging in iterative grammar learning with "Gutenberg Children Books" corpus

1

Cluster tags and words in tagged grammar .dict and cat_tree files. Either tagging or input parses filtering issue, OR issues in corpus preventing correct link extraction? Jupyter notebook -- [Iterative-clustering-ILE-POCE-CDS-2019-02-27.ipynb](https://github.com/singnet/language-learning/blob/master/notebooks/Iterative-clustering-ILE-POCE-CDS-2019-02-27.ipynb)...

OlegBaskov

bug

Using ats (@) and periods (.) for suffixes in Pre-Cleaner, MST-Parser, Grammar Learner and Link Grammar

3

Few problems: 1. During iterative grammar learning, tagging words in input corpus and input parses may face ambiguity if the words with ats (@) in parses and corpus are translated...

akolonin

Grammar Learner internal formats refactoring

We need to have Grammar Learner internal formats refactoring eventually, based on code review by @OlegBaskov: https://docs.google.com/document/d/1yauyi9Y9OD1Cefow197OTnGqm6ZDSqK1T-v6bR5CUHI/edit#heading=h.37kbmfpxjcy0

akolonin

enhancement

language-learning
language-learning copied to clipboard

Metadata

Missing unit test for grammar-tester handling .db dictionaries

Hybrid sequential-MST parser

Fix GT MWC counter and re-evaluate grammars with MWC > 1

Unsupervised Parser Challenge for Gutenberg Children corpus

Parse-evaluator "sequential" and "random" test-file bugs

Tokenization is different for LG English and LG ANY - which problems may be raised by this

Grammar Learner: Identical Lexical Entries (ILE) algorithm based on multi-germ/single-disjunct entries

Wrong tagging in iterative grammar learning with "Gutenberg Children Books" corpus

Using ats (@) and periods (.) for suffixes in Pre-Cleaner, MST-Parser, Grammar Learner and Link Grammar

Grammar Learner internal formats refactoring

← Metadata

Owner

Metadata

language-learning language-learning copied to clipboard

Metadata

← Metadata

Owner

Metadata

language-learning
language-learning copied to clipboard