low-resource-languages topic

List low-resource-languages repositories

xl-sum

245
Stars
42
Forks
Watchers

This repository contains the code, data, and models of the paper titled "XL-Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages" published in Findings of the Association for Compu...

low-resource-languages

374
Stars
56
Forks
Watchers

Resources for conservation, development, and documentation of low resource (human) languages.

banglanmt

145
Stars
45
Forks
Watchers

This repository contains the code and data of the paper titled "Not Low-Resource Anymore: Aligner Ensembling, Batch Filtering, and New Datasets for Bengali-English Machine Translation" published in Pr...

africanlp-public-datasets

82
Stars
19
Forks
Watchers

A repository for publicly/freely available Natural Language Processing (NLP) datasets for African languages.

CogNet

40
Stars
9
Forks
Watchers

CogNet: a large-scale, high-quality cognate database for 338 languages, 1.07M words, and 8.1 million cognates

Indian_ParallelCorpus

28
Stars
3
Forks
Watchers

Curated list of publicly available parallel corpus for Indian Languages

BiLatticeRNN-Confidence

16
Stars
4
Forks
Watchers

Confidence Estimation for Black Box Automatic Speech Recognition Systems Using Lattice Recurrent Neural Networks https://arxiv.org/abs/1910.11933 or https://ieeexplore.ieee.org/document/9053264

Filipino-Text-Benchmarks

57
Stars
8
Forks
Watchers

Open-source benchmark datasets and pretrained transformer models in the Filipino language.

Turkish-Text-to-Speech

35
Stars
5
Forks
Watchers

Speech synthesis (TTS) in low-resource languages by training from scratch with Fastpitch and fine-tuning with HifiGan