tokenizer topic

List tokenizer repositories

graphql-query-compress

38
Stars
5
Forks
Watchers

Compress GraphQL Query String

xontrib-output-search

36
Stars
3
Forks
Watchers

Get identifiers, paths, URLs and words from the previous command output and use them for the next command in xonsh shell.

ilmulti

21
Stars
4
Forks
Watchers

Tooling to play around with multilingual machine translation for Indian Languages.

hebrew_tokenizer

21
Stars
2
Forks
Watchers

A field-tested Hebrew tokenizer for dirty texts (ben-yehuda project, bible, cc100, mc4, opensubs, oscar, twitter) focused on multi-word expression extraction.

nlp-js-tools-french

36
Stars
8
Forks
Watchers

POS Tagger, lemmatizer and stemmer for french language in javascript

ArabicProcessingCog

25
Stars
6
Forks
Watchers

A Python package that do stemming, tokenization, sentence breaking, segmentation, normalization, POS tagging for Arabic language.

pascal-interpreter

33
Stars
1
Forks
Watchers

A simple interpreter for a large subset of Pascal language written for educational purposes

lex

56
Stars
5
Forks
Watchers

Lex is an implementation of lex tool in Ruby.

mystem-scala

24
Stars
16
Forks
Watchers

Morphological analyzer `mystem` (Russian language) wrapper for JVM languages

Hebrew-Tokenizer

25
Stars
6
Forks
Watchers

A very simple python tokenizer for Hebrew text.