Peter Ljunglöf issues

Results 21 issues of


                                            Peter Ljunglöf

Spelling errors in pattern matches

GF doesn't make any syntactic distinction between how parameters, functions and variables are written. There is a convention that parameters should be capitalised, but functions and variables be lower-cased, but...

Review process for new languages / big updates

When someone issues a new Pull Request, there should be an open and transparent review process before it's accepted. I suggest it consists of the following: 1. There must be...

Test cases for languages

Can we define how to write test cases for RGL languages? Here's a simple suggestion: every language (starting with new ones) should have a small corpus with positive examples. The...

Rename the "bantu" functor

The addition of the `bantu` functor (in PR #32 ), together with the three new languages `egekusii`, `kikamba` and `kiswahili`, is a welcome addition to the RGL. But unfortunately the...

Support SpaCy (https://spacy.io)

new functionality

Adding Stanza for other languages than Swedish

The Stanford NLP documentation recommends changing to Stanza when working in Python: > We are actively developing a Python package called Stanza, with state-of-the-art NLP performance enabled by deep learning....

Port some modules to Cython

Some of the Sparv internal modules are quite slow (I'm thinking about the Saldo annotations). If they were ported to Cython (https://cython.org) they would probably be 50-80% faster. Apparently this...

enhancement

Merging many small input files into large chunks

Several modules (I've tried segmentation and Hunpos) are much more efficient (tokens/sec) on large texts. So if we have a large corpus consisting of many smaller texts in separate files,...

Add cleaning of HTML

It would be nice to be able to use HTML files as the corpus. Here is code for converting HTML markup to plain text, which possibly could be transformed into...

new functionality

Examples of annotation errors

Here you can add examples where the automatic Sparv annotations are clearly wrong. E.g., errors in POS-tagging, dependency parsing, lemmatisation, foreign words, etc. The examples can also be from Korp...