niacin
niacin copied to clipboard
Enrich your data
niacin
A Python library for replacing the missing variation in your text data.
Why should I use this?
Data collected for model training necessarily undersamples the likely variance in the input space. This library is a collection of tools for inserting typical kinds of perturbations to better approximate population variance; and, for creating similar-but-incorrect examples to aid in reducing the total size of the hypothesis space. These are commonly known as ENRICHMENT and NEGATIVE SAMPLING, respectively.
How do I use this?
Functions in niacin are separated into submodules for specific data types. Functions expose a similar API, with two input arguments: the data to be transformed, and the probability of applying a specific transformation.
enrichment:
from niacin.text import en
data = "This is the song that never ends and it goes on and on my friends"
print(en.add_misspelling(data, p=1.0))
This is teh song tath never ends adn it goes on anbd on my firends
negative sampling:
from niacin.text import en
data = "This is the song that never ends and it goes on and on my friends"
print(en.add_hypernyms(data, p=1.0))
This is the musical composition that never extremity and it exit on and on my person
How do I install this?
with pip
:
pip install niacin
from source:
git clone [email protected]:deniederhut/niacin.git && cd niacin && python setup.py install
If you have installed niacin
from source, you can run the test suite to verify that
everything is working properly. We use pytest
,
which you will first need to install:
pip install pytest
then you can run the library's tests with
pytest -m 'not slow'
if you would like to see the coverage report, you can do so with pytest-cov
like so:
pip install pytest-cov
pytest -m 'not slow' --cov=niacin && coverage html
How can I install the optional dependencies?
If you want to use the backtranslate functionality, niacin will need pytorch and some other libraries. These can be installed as extras with:
pip install niacin[backtranslate]
If you are on macos, this might fail with a warning about your version of gcc:
Your compiler (g++) is not compatible with the compiler Pytorch was
built with for this platform, which is clang++ on darwin.
You can avoid this error by executing the following:
CFLAGS='-stdlib=libc++' pip install niacin[backtranslate]