language-model topic
AutoenCODE
AutoenCODE is a Deep Learning infrastructure that allows to encode source code fragments into vector representations, which can be used to learn similarities.
nucliadb
NucliaDB, The AI Search database for RAG
suggest
Top-k Approximate String Matching.
gdc
Code accompanying our papers on the "Generative Distributional Control" framework
TextRL
Implementation of ChatGPT RLHF (Reinforcement Learning with Human Feedback) on any generation model in huggingface's transformer (blommz-176B/bloom/gpt/bart/T5/MetaICL)
KenLM-training
Training an n-gram based Language Model using KenLM toolkit for Deep Speech 2
COCO-LM
[NeurIPS 2021] COCO-LM: Correcting and Contrasting Text Sequences for Language Model Pretraining
CoLAKE
COLING'2020: CoLAKE: Contextualized Language and Knowledge Embedding
PhoNLP
PhoNLP: A BERT-based multi-task learning model for part-of-speech tagging, named entity recognition and dependency parsing (NAACL 2021)
Romanian-Transformers
This repo is the home of Romanian Transformers.