Neural-Machine-Translation-English-Hindi-for-domain-data
Neural-Machine-Translation-English-Hindi-for-domain-data copied to clipboard
NLP Application Project
Neural-Machine-Translation
NLP Application Project
2.2.3 Build an NMT (Neural MT) system when training data (parallel sentences in the concerned source and target language) is available in a domain. However, such domain data is of small size. Machine learning is to be used in such a way that the small sized domain data can be combined with the large amount of general data.
Contributor:
- Arushi Singhal 201516178
- Simran Singhal 201516190
Presentation :- https://docs.google.com/presentation/d/1UgQXnST6rxZpctD8Atuaus7-2tdmhHMxCvMiZXemXck/edit?usp=sharing
Interim Report:- https://docs.google.com/document/d/1n1o2qPxLaCnB0E83i_ZiPZCA_8fN_uMCrQ-CQCzlql4/edit?usp=sharing
Report:- https://docs.google.com/document/d/10rAypGzTKjiJOw9Xe0qi9jYFTlNq8AQitohGlploK44/edit?usp=sharing
References
- https://pytorch.org/tutorials/intermediate/seq2seq_translation_tutorial.html (main)
- https://arxiv.org/abs/1409.3215 (Research Paper)
- http://www.manythings.org/anki/
- https://machinelearningmastery.com/encoder-decoder-recurrent-neural-network-models-neural-machine-translation/
- https://machinelearningmastery.com/encoder-decoder-long-short-term-memory-networks/
- https://machinelearningmastery.com/develop-neural-machine-translation-system-keras/
- http://jalammar.github.io/visualizing-neural-machine-translation-mechanics-of-seq2seq-models-with-attention/
- https://towardsdatascience.com/nlp-sequence-to-sequence-networks-part-1-processing-text-data-d141a5643b72
- https://towardsdatascience.com/nlp-sequence-to-sequence-networks-part-2-seq2seq-model-encoderdecoder-model-6c22e29fd7e1
- https://nlp.stanford.edu/~johnhew/public/14-seq2seq.pdf
- https://www.analyticsvidhya.com/blog/2018/03/essentials-of-deep-learning-sequence-to-sequence-modelling-with-attention-part-i/
- https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
- https://www.coursera.org/learn/nlp-sequence-models/lecture/ftkzt/recurrent-neural-network-model
- https://machinelearningmastery.com/encoder-decoder-attention-sequence-to-sequence-prediction-keras/ (important)
- https://blog.keras.io/a-ten-minute-introduction-to-sequence-to-sequence-learning-in-keras.html
- https://github.com/bentrevett/pytorch-seq2seq/blob/master/1%20-%20Sequence%20to%20Sequence%20Learning%20with%20Neural%20Networks.ipynb
- https://towardsdatascience.com/word-level-english-to-marathi-neural-machine-translation-using-seq2seq-encoder-decoder-lstm-model-1a913f2dc4a7
- https://discuss.pytorch.org/t/are-the-outputs-of-bidirectional-gru-concatenated/15103
- https://towardsdatascience.com/attention-seq2seq-with-pytorch-learning-to-invert-a-sequence-34faf4133e53
- https://github.com/spro/practical-pytorch/blob/master/seq2seq-translation/seq2seq-translation-batched.ipynb
- https://towardsdatascience.com/understanding-bidirectional-rnn-in-pytorch-5bd25a5dd66
- https://discuss.pytorch.org/t/cuda-changes-expected-lstm-hidden-dimensions/10765/6
- https://discuss.pytorch.org/t/cuda-changes-expected-lstm-hidden-dimensions/10765/6
- https://github.com/A-Jacobson/minimal-nmt/blob/master/nmt_tutorial.ipynb (Important)
- https://medium.com/@martinpella/how-to-use-pre-trained-word-embeddings-in-pytorch-71ca59249f76 (GloVe in pytorch)
Hindi text Normalization
- http://talukdar.net/papers/KBCS04_HPL-1.pdf
- https://medium.com/lingvo-masino/do-you-know-about-text-normalization-a19fe3090694
The IIT Bombay English-Hindi Parallel Corpus
https://www.cse.iitb.ac.in/~pb/papers/lrec18-iitbparallel.pdf
Document Link to the Errors found in the Dataset
https://docs.google.com/document/d/1zz67TTlVi0YuH7zUjD3up4O_7qKd8lCtElhxcH1bMWk/edit
Data Generator
https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly
Pytorch Neural Network Colab link got through flow group
https://colab.research.google.com/drive/1DgkVmi6GksWOByhYVQpyUB4Rk3PUq0Cp?fbclid=IwAR076PTAKeD99mN-htpMxCY4FaJNadF_OfCNry02rBwwixadJ-n1rygnW7I#scrollTo=6Q1AhoIB-pkp
Anaconda installation
https://www.digitalocean.com/community/tutorials/how-to-install-anaconda-on-ubuntu-18-04-quickstart
Multiple GPUs
https://www.pyimagesearch.com/2017/10/30/how-to-multi-gpu-training-with-keras-python-and-deep-learning/
- https://github.com/ZhenYangIACAS/NMT
- https://github.com/tuzhaopeng/nmt
- https://paperswithcode.com/paper/modeling-coverage-for-neural-machine#code
Thesis work done for converting between Hindi to English on almost same size of data
https://www.google.com/url?sa=t&rct=j&q=&esrc=s&source=web&cd=4&cad=rja&uact=8&ved=2ahUKEwj72dn33fXhAhWNbn0KHYNnDNUQFjADegQIBBAC&url=http%3A%2F%2Fweb2py.iiit.ac.in%2Fresearch_centres%2Fpublications%2Fdownload%2Fmastersthesis.pdf.af2224b7bc18088c.4b756e616c2d5468657369732d46696e616c2e706466.pdf&usg=AOvVaw2PZO-pochZDvz7x-4t49pa
Researchgate for hindi to english machine translation
https://www.researchgate.net/publication/228783817_Machine_translation_of_bi-lingual_hindi-english_hinglish_text