artificial-text-detection
artificial-text-detection copied to clipboard
Python framework for artificial text detection: NLP approaches to compare natural text against generated by neural networks.
Artificial Text Detection
Python framework for artificial text detection: NLP approaches to compare natural text against generated by neural networks.
Contents
Project description is put into:
Installation steps:
We use poetry as an enhanced dependency resolver.
make poetry-download
poetry install --no-dev
Datasets for artificial text detection
To create datasets for the further classification, it is necessary to collect them. There are 2 available ways for it:
- Via Data Version Control.
Get in touch with
@msaidovin order to have the access to the private Google Drive; - Via datasets generation. One dataset with a size of 20,000 samples was process with MT model on V100 GPU for 30 mins;
Data Version Control usage:
poetry add "dvc[gdrive]"
Then, run dvc pull. It will download preprocessed translation datasets
from the Google Drive.
Datasets generation
To generate translations before artificial text detection pipeline,
install the detection module from the cloned repo or PyPi (TODO):
pip install -e .
Then, run generate script:
python detection/data/generate.py --dataset_name='tatoeba' --size=20000 --device='cuda:0'
Simple run:
To run the artificial text detection classifier, execute the pipeline:
python detection/old.py