Aflah comments

Results 125 comments of


                                            Aflah

`char_to_token` in `keras_nlp.tokenizers.Tokenizer`

I can work on this if no one else is taking this up!

Add a vocabulary_size argument to WordPieceTokenizer

Hey @blackhat-coder, are you still working on this?

Adding Data Augmentation Techniques Natively

This is also partly inspired by the ideas mentioned in the [GSOC Document](https://docs.google.com/document/d/1fLDLwIhnwDUz3uUV8RyUZiOlmTN9Uzy5ZuvI8iDDFf8/edit#)

Adding Data Augmentation Techniques Natively

@mattdangerw and rest of the Keras-Team would be great to hear your thoughts on this

Adding Data Augmentation Techniques Natively

As a starting point I've implemented EDA whilst also fixing some of the bugs which are present in the original EDA code such as not excluding stop words in some...

Adding Data Augmentation Techniques Natively

I've implemented it seems I'm getting the almost 3% gains mentioned in the paper https://github.com/aflah02/Easy-Data-Augmentation-Implementation/blob/main/EDA.ipynb What should be the next step now? @mattdangerw or anyone else from the Keras Team

Adding Data Augmentation Techniques Natively

@mattdangerw Sure! I had quite a bit of fun while trying to implement this too, while you guys figure out the way you'd prefer I'll try implementing other techniques too

Adding Data Augmentation Techniques Natively

Backtranslation done on smaller sample size also seems pretty good at giving results: https://github.com/aflah02/BackTranslation-Based-Data-Augmentation Maybe we could do this using parallelized processes to make it faster as for large datasets,...

Adding Data Augmentation Techniques Natively

I recently came across this paper [SSMBA: Self-Supervised Manifold Based Data Augmentation for Improving Out-of-Domain Robustness](https://aclanthology.org/2020.emnlp-main.97.pdf) which uses a corruption and reconstruction function to recreate new samples from the real...

Adding Data Augmentation Techniques Natively

Another great paper [(Synthetic and Natural Noise Both Break Neural Machine Translation)](https://arxiv.org/pdf/1711.02173.pdf) which aims to make NMT models more robust to typos and other corruptions which humans can easily overcome....