machine-learning-for-nlp-guide
machine-learning-for-nlp-guide copied to clipboard
Guide for engineers interested in NLP machine learning
machine-learning-for-nlp-guide
Guide for engineers interested in NLP machine learning
Path
-
Understand possibilities and form business applications
- Everyone AI for Everyone
-
Either level up through:
- Gaining theoretical foundation of Deep Learning for NLP
- Stanford Course Materials http://web.stanford.edu/class/cs224n/
- Natural Language Processing with Deep Learning https://www.youtube.com/watch?v=8rXD5-xhemo&list=PLoROMvodv4rOhcuXMZkNm7j3fVwBBY42z
- Stanford CS224U: Natural Language Understanding https://www.youtube.com/watch?v=tZ_Jrc_nRJY&list=PLoROMvodv4rObpMCir6rNNUlFAn56Js20
- Getting "Practical" Knowledge of Deep Learning for NLP
- Gaining theoretical foundation of Deep Learning for NLP
-
Learn how to Deep Learning
- Nuts and Bolts of Applying Deep Learning
- "Everyday" Engineers Fast.ai
- Research Engineers Deep Learning AI
-
Learn about all the stuff "they don't teach"
- Learn Production-Level Deep Learning: https://fullstackdeeplearning.com/
- Resources: https://github.com/full-stack-deep-learning/fsdl-text-recognizer-project
-
Base Models to Use
- Spacy for general NLP tasks
- HuggingFace Transformers
-
Profit
State of the Art Methods
Resources
- Syntactic Search over Wikipedia: https://spike.wikipedia.apps.allenai.org/search/wikipedia
- Odinson: Rapidly query a natural language knowledge base https://github.com/lum-ai/odinson
- CheckList: Behavioral Testing NLP https://github.com/marcotcr/checklist
- Data project checklist https://www.fast.ai/2020/01/07/data-questionnaire
- BERT, ELMo, & GPT-2: How Contextual are Contextualized Word Representations? http://ai.stanford.edu/blog/contextual/
- BERT commit log https://amitness.com/2020/05/git-log-of-bert/
- Full stack deep learning github repo: https://github.com/full-stack-deep-learning/fsdl-text-recognizer-project
- Expand Data Labeled Data using Unlabled Data
- Blog: https://ai.googleblog.com/2019/03/harnessing-organizational-knowledge-for.html
- Detailed Article: https://towardsdatascience.com/a-look-into-snorkel-drybell-8e9e781dc250
- Explain Predictions
- Python Library: https://github.com/jphall663/awesome-machine-learning-interpretability
- Deploy models to production
- Tutorial: https://hackernoon.com/enterprise-af-solution-for-text-classification-using-bert-9fe2b7234c46
- Learn how to implement new models
- Deep Learning from the Foundations: https://www.fast.ai/2019/06/28/course-p2v3/
- More Learning Resources:
- nlp-library curated list of papers
- Machine Learning System Best Practice and Design:
- The ML Test Score: A Rubric for ML Production Readiness and Technical Debt Reduction: https://ai.google/research/pubs/pub46555
- Machine Learning: The High Interest Credit Card of Technical Debt: https://ai.google/research/pubs/pub43146
- An Interactive Visualization to Explore NLP Papers
- How Big Should My Language Model Be?
- Accelerate your NLP pipelines using Hugging Face Transformers and ONNX Runtime
Tools
- https://prodi.gy/buy
- Text and image annotation
- https://github.com/chakki-works/doccano
- Open source text annotation tool
- https://www.media.mit.edu/projects/dive/overview/
- DIVE is a web-based data exploration system that lets non-technical users create stories from their data without writing code. DIVE combines semantic data ingestion, recommendation-based visualization and analysis, and dynamic story sharing into a unified workflow.
Infrastructure
- Seldon
- https://www.youtube.com/watch?time_continue=2&v=cDtzu4WBzWA
- https://github.com/kubeflow/example-seldon
- https://docs.seldon.io/projects/seldon-core/en/latest/examples/nvidia_mnist.html
- Kubeflow
- https://www.kubeflow.org/docs/started/getting-started/
- TFX
- https://www.tensorflow.org/tfx

Research Interest
- Text Atlas
- Feature Visualization https://distill.pub/2017/feature-visualization/
- Activation Atlas https://distill.pub/2019/activation-atlas/
Newsletter to Follow
- NLP News http://newsletter.ruder.io
- The Batch https://www.deeplearning.ai/thebatch/
Podcasts to listen
- NLP Highlights https://soundcloud.com/nlp-highlights
Blogs to Follow
- Google Data Analytics https://cloud.google.com/blog/products/data-analytics/
- AWS Big Data Blog https://aws.amazon.com/blogs/big-data/
- fast.ai http://www.fast.ai/
- FastML http://fastml.com/
- The Unofficial Google Data Science Blog http://www.unofficialgoogledatascience.com/
- DeepMind https://deepmind.com/blog/
- The Official Google Blog https://www.blog.google/
- Distill https://distill.pub
- DataCamp Community https://www.datacamp.com/community
- AI Applications https://vaultanalytics.com/marketinganalytics
- Google AI Blog http://ai.googleblog.com/
- Google Developers Blog http://developers.googleblog.com/
- the morning paper https://blog.acolyer.org
- Machine Learning @ Berkeley https://medium.com/@ml.at.berkeley?source=rss-a34a9c1d8009------2
- All - naacl.org http://naacl-org.github.com
- Facebook Research https://research.fb.com
- OpenAI https://blog.openai.com
- Y Combinator http://www.ycombinator.com
- The Berkeley Artificial Intelligence Research Blog http://bair.berkeley.edu/blog/
- No Free Hunch http://blog.kaggle.com
- Off the convex path http://offconvex.github.io/
Datasets
- A unified platform for sharing, training and evaluating dialogue models across many tasks. https://parl.ai/
You can also follow me on twitter: https://twitter.com/LeoApolonio