NLPiper
NLPiper copied to clipboard
NLPiper is a package that agglomerates different NLP tools and applies their transformations in the target document.
## 🚀 Feature Add transforms for data augmentation **Motivation** During training data augmentation are essential for better performances
## 🚀 Feature Allow `cleaners.CleanPunctuation` to bypass some symbols. This allows for instance to maintain the sentence meaning on the document. Otherwise, the current solution removes all the points and...
## 🚀 Feature Add `Hugging Face` integration to use contextual embeddings like `Bert`.
## 🚀 Feature Calculate statistics around the processed data. **Motivation** Knowing global statistics for the processed document could be of great interest, such as the number of chars, tokens, processed...
## 🚀 Feature Allow the user to add a custom Tokenizer with minor implementation. **Motivation** It is impossible to integrate every tool, but that does not mean that it shouldn't...
If we have pipeline that run transform A, B, C and then we process doc and we create another pipeline but also add D, it should be possible to just...