Peter Ljunglöf
Peter Ljunglöf
GF doesn't make any syntactic distinction between how parameters, functions and variables are written. There is a convention that parameters should be capitalised, but functions and variables be lower-cased, but...
When someone issues a new Pull Request, there should be an open and transparent review process before it's accepted. I suggest it consists of the following: 1. There must be...
Can we define how to write test cases for RGL languages? Here's a simple suggestion: every language (starting with new ones) should have a small corpus with positive examples. The...
The addition of the `bantu` functor (in PR #32 ), together with the three new languages `egekusii`, `kikamba` and `kiswahili`, is a welcome addition to the RGL. But unfortunately the...
The Stanford NLP documentation recommends changing to Stanza when working in Python: > We are actively developing a Python package called Stanza, with state-of-the-art NLP performance enabled by deep learning....
Some of the Sparv internal modules are quite slow (I'm thinking about the Saldo annotations). If they were ported to Cython (https://cython.org) they would probably be 50-80% faster. Apparently this...
Several modules (I've tried segmentation and Hunpos) are much more efficient (tokens/sec) on large texts. So if we have a large corpus consisting of many smaller texts in separate files,...
It would be nice to be able to use HTML files as the corpus. Here is code for converting HTML markup to plain text, which possibly could be transformed into...
Here you can add examples where the automatic Sparv annotations are clearly wrong. E.g., errors in POS-tagging, dependency parsing, lemmatisation, foreign words, etc. The examples can also be from Korp...