LEMLAT3
LEMLAT3 copied to clipboard
Morphological analyzer and lemmatizer for Latin.
We have tested LEMLAT on a corpus of classical Latin texts from a university reading list. The corpus contains some 23,700 words and 8,538 different word forms: Terence's Adelphoe, Horace's...
Three types of input cause error ('segmentation fault') in batch mode: * strings containing backslash character `'\'` * strings containing some non ascii (further investigation needed to know exactly) *...
path length is fixed. must be 'freed'
i9917 La forma "inpraesentiarum" non è analizzata. Il les è ""impraesentiarum". C'è correttamente il codice i04 in a_gra, che gestisce l'alternanza grafica inp/imp.
I have noticed that punctuation marks apart from the hyphen - are not analyzed by LEMLAT, not even as unknown wordforms in the unk file (where "-" lands). However, when...
virus (u0803) -> non produce il genere in output. Il problema è il seguente. Il programma ha un vincolo (sul "lessario") per cui un les con codles='fe' che è usato...
Create a filter/function to group identical analyses into a single entry. For example, analyses `18` and `19` of `forma` (Du Cange) are identical: ``` ============================ANALYSIS 18================================== SEGMENTATION: form -a ---------------------morphological...
Ciao Paolo, Qui qualche correzione da apportare all'inglese in output di LEMLAT. ### Fix 1 La parola `LEMMI` andrebbe tradotta in `LEMMAS`. Vedi esempio sotto: ``` ---------------------morphological feats ----------------------------- LEMMI:...
I noticed that, launching LEMLAT on a file like this: `./lemlat_client -i /input.txt -c output.csv` - the list of unknown forms is saved under the name `input.txt.unk`, and not as...
Add a GitHub Wiki page to explain how this repository's file system works. It's not immediately obvious which folder contains what and what all the files are.