LEMLAT3 icon indicating copy to clipboard operation
LEMLAT3 copied to clipboard

Common words and forms missing from LEMLAT

Open nevenjovanovic opened this issue 7 years ago • 2 comments

We have tested LEMLAT on a corpus of classical Latin texts from a university reading list. The corpus contains some 23,700 words and 8,538 different word forms: Terence's Adelphoe, Horace's Odes Bk. 1, Tibullus Bk. 1, Seneca's Letters Bk. 1 (all editions from the PerseusDL collection). Beside various forms of personal names (and some typos in our sources), there were 40 word forms not recognized by LEMLAT; a tiny percent of all forms -- but the list is below. Some reasons for not recognizing the forms seem to be orthographical (ë, omitted -p- in emta, demsi, oe in foeneraret; words joined instead of separated -- illiusmodi). Some have to do with meter in comedy - the elided -n', from -ne, is regularly not recognized by LEMLAT. Some missing forms are fairly common: norimus, nosse.

I propose that the forms from the list below be added to the LEMLAT database.


adteruisse
audistin
coëmisse
demseris
demsi
egon
emta
emtae
emtam
foeneraret
haecine
hancine
hocine
hoscine
illan
illiusmodi
ipsus
lucu
men
norimus
nosse
nossem
nostin
numquidnam
poëta
poëtae
posthaec
propediem
quamobrem
quamprimum
quandoquidem
quorundam
quotannis
refrixerit
sumtuosa
tamdiu
tantummodo
tercentenas
tetigin
tun

nevenjovanovic avatar Aug 21 '17 09:08 nevenjovanovic