data-generation topic
CodeMixed-Text-Generator
This tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.
ProGen
[EMNLP-2022 Findings] Code for paper “ProGen: Progressive Zero-shot Dataset Generation via In-context Feedback”.
DatabaseBenchmark
A universal database query benchmark tool
GSR-Net
Graph SuperResolution Network using geometric deep learning.
SymGen
[EMNLP'23] Code for Generating Data for Symbolic Language with Large Language Models
Gen4Gen
🏞️ Official implementation of "Gen4Gen: Generative Data Pipeline for Generative Multi-Concept Composition"
seed_factory
A toolkit for test data generation
ICD10Data.com
http://icd10data.com/ data scraping
SynTable
The official code implementation for SynTable - A Synthetic Data Generation Pipeline for Unseen Object Amodal Instance Segmentation of Cluttered Tabletop Scenes
GraphGen
GraphGen: Enhancing Supervised Fine-Tuning for LLMs with Knowledge-Driven Synthetic Data Generation