lexicons icon indicating copy to clipboard operation
lexicons copied to clipboard

Dictionaries of names, surnames, acronyms and it's extensions, stop-words, etc., which I gathered for different experiments.

lexica-lists-words

Dictionaries and lists of names, acronyms and it's extensions, stop-words, etc., which I gathered for different experiments. Acronyms were automatically extracted with A Simple Algorithm for Identifying Abbreviation Definitions in Biomedical Text by A.S Schwartz and M.A. Hearst. A Java implementation is available here.

  • NomesLex-PT a lexicon of person names made up of 2,027 first names and 8,019 surnames, more information here.

  • PT-stopwords.txt a collections of stop-words for Portuguese.

  • geo-net-pt02_terms_frequency_wpt05.zip contains the frequency of occurrence of toponyms names from Geo-Net-PT_02 in WPT05 a crawl of the Portuguese Web

  • names-surnames-NL-UK-IT-PT-ES.zip a list of names and surnames for Dutch, English, Portuguese and Spanish.

  • publico-cargos.txt a list of Portuguese noun quantifiers, i.e., words that occur before a proper noun, gathered from the on-line newspaper publico.pt.

  • publico-acronyms.txt a list of acronyms and it's possible extensions, extracted from a collection of Portuguese news gathered from the on-line newspaper publico.pt.

  • wikipedia-acronyms.txt a list of acronyms and it's possible extesions, extracted from the English Wikipedia.