deeplearning-papernotes icon indicating copy to clipboard operation
deeplearning-papernotes copied to clipboard

Summaries and notes on Deep Learning research papers

2018-02

  • The Matrix Calculus You Need For Deep Learning [arXiv]
  • Regularized Evolution for Image Classifier Architecture Search [arXiv]
  • Online Learning: A Comprehensive Survey [arXiv]
  • Visual Interpretability for Deep Learning: a Survey [arXiv]
  • Behavior is Everything – Towards Representing Concepts with Sensorimotor Contingencies [paper] [article] [code]
  • IMPALA: Scalable Distributed Deep-RL with Importance Weighted Actor-Learner Architectures [arXiv] [article] [code]
  • DeepType: Multilingual Entity Linking by Neural Type System Evolution [arXiv] [article] [code]
  • DensePose: Dense Human Pose Estimation In The Wild [arXiv] [article]

2018-01

  • Nested LSTMs [arXiv]
  • Generating Wikipedia by Summarizing Long Sequences [arXiv]
  • Scalable and accurate deep learning for electronic health records [arXiv]
  • Kernel Feature Selection via Conditional Covariance Minimization [NIPS paper] [article] [code]
  • Psychlab: A Psychology Laboratory for Deep Reinforcement Learning Agents [arXiv] [article] [code]
  • Fine-tuned Language Models for Text Classification [arXiv] [code] (soon)
  • Deep Learning: An Introduction for Applied Mathematicians [arXiv]
  • Innateness, AlphaZero, and Artificial Intelligence [arXiv]
  • Can Computers Create Art? [arXiv]
  • eCommerceGAN : A Generative Adversarial Network for E-commerce [arXiv]
  • Expected Policy Gradients for Reinforcement Learning [arXiv]
  • DroNet: Learning to Fly by Driving [UZH docs] [article] [code]
  • Symmetric Decomposition of Asymmetric Games [Scientific Reports] [article]
  • Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor [arXiv] [code]
  • SBNet: Sparse Blocks Network for Fast Inference [arXiv] [article] [code]
  • DeepMind Control Suite [arXiv] [code]
  • Deep Learning: A Critical Appraisal [arXiv]

2017-12

  • Adversarial Patch [arXiv]
  • CNN Is All You Need [arXiv]
  • Learning Robot Objectives from Physical Human Interaction [paper] [article]
  • The NarrativeQA Reading Comprehension Challenge [arXiv] [dataset]
  • Objects that Sound [arXiv]
  • Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions [arXiv] [article] [article2]
  • Deep Neuroevolution: Genetic Algorithms Are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning [arXiv] [article] [code]
  • Improving Exploration in Evolution Strategies for Deep Reinforcement Learning via a Population of Novelty-Seeking Agents [arXiv] [article] [code]
  • Superhuman AI for heads-up no-limit poker: Libratus beats top professionals [Science]
  • Mathematics of Deep Learning [arXiv]
  • State-of-the-art Speech Recognition With Sequence-to-Sequence Models [arXiv] [article]
  • Peephole: Predicting Network Performance Before Training [arXiv]
  • Deliberation Network: Pushing the frontiers of neural machine translation [Research at Microsoft] [article]
  • GPU Kernels for Block-Sparse Weights [Research at OpenAI] [article] [code]
  • Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm [arXiv]
  • Deep Learning Scaling is Predictable, Empirically [arXiv] [article]

2017-11

  • High-Resolution Image Synthesis and Semantic Manipulation with Conditional GANs [arXiv] [article] [code]
  • StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation [arXiv] [code]
  • Population Based Training of Neural Networks [arXiv] [article]
  • Distilling a Neural Network Into a Soft Decision Tree [arXiv]
  • Neural Text Generation: A Practical Guide [arXiv]
  • Parallel WaveNet: Fast High-Fidelity Speech Synthesis [DeepMind documents] [article]
  • CheXNet: Radiologist-Level Pneumonia Detection on Chest X-Rays with Deep Learning [arXiv] [article]
  • Non-local Neural Networks [arXiv]
  • Deep Image Prior [paper] [article] [code]
  • Online Deep Learning: Learning Deep Neural Networks on the Fly [arXiv]
  • Learning Explanatory Rules from Noisy Data [arXiv]
  • Improving Palliative Care with Deep Learning [arXiv] [article]
  • VoxelNet: End-to-End Learning for Point Cloud Based 3D Object Detection [arXiv]
  • Weighted Transformer Network for Machine Translation [arXiv] [article]
  • Non-Autoregressive Neural Machine Translation [arXiv] [article]
  • Block-Sparse Recurrent Neural Networks [arXiv]
  • A Unified Game-Theoretic Approach to Multiagent Reinforcement Learning [arXiv]
  • Neural Discrete Representation Learning [arXiv] [article]
  • Don't Decay the Learning Rate, Increase the Batch Size [arXiv]
  • Hierarchical Representations for Efficient Architecture Search [arXiv]

2017-10

  • Unsupervised Machine Translation Using Monolingual Corpora Only [arXiv]
  • Dynamic Routing Between Capsules [arXiv]
  • A generative vision model that trains with high data efficiency and breaks text-based CAPTCHAs [Science] [article] [code]
  • Understanding Grounded Language Learning Agents [arXiv]
  • Planning, Fast and Slow: A Framework for Adaptive Real-Time Safe Trajectory Planning [arXiv] [article] [code] (soon)
  • Malware Detection by Eating a Whole EXE [arXiv] [article]
  • Progressive Growing of GANs for Improved Quality, Stability, and Variation [Research at Nvidia] [article] [code]
  • Meta Learning Shared Hierarchies [arXiv] [article] [code]
  • Deep Voice 3: 2000-Speaker Neural Text-to-Speech [arXiv] [article]
  • AVA: A Video Dataset of Spatio-temporally Localized Atomic Visual Actions [arXiv] [article] [dataset]
  • Mastering the game of Go without Human Knowledge [Nature] [article]
  • Sim-to-Real Transfer of Robotic Control with Dynamics Randomization [arXiv] [article]
  • Asymmetric Actor Critic for Image-Based Robot Learning [arXiv] [article]
  • A systematic study of the class imbalance problem in convolutional neural networks [arXiv]
  • Generalization in Deep Learning [arXiv]
  • Swish: a Self-Gated Activation Function [arXiv]
  • Emergent Translation in Multi-Agent Communication [arXiv]
  • SLING: A framework for frame semantic parsing [arXiv] [article] [code]
  • Meta-Learning for Wrestling [arXiv] [article] [code]
  • Mixed Precision Training [arXiv] [article] [article2] [code/docs]
  • Generative Adversarial Networks: An Overview [arXiv]
  • Emergent Complexity via Multi-Agent Competition [arXiv] [article] [code]
  • Deep Lattice Networks and Partial Monotonic Functions [Research at Google] [article] [code]
  • The IIT Bombay English-Hindi Parallel Corpus [arXiv] [article]
  • Rainbow: Combining Improvements in Deep Reinforcement Learning [arXiv]
  • Lifelong Learning With Dynamically Expandable Networks [arXiv]
  • Variational Inference & Deep Learning: A New Synthesis (Thesis) [dropbox]
  • Neural Task Programming: Learning to Generalize Across Hierarchical Tasks [arXiv]
  • Neural Color Transfer between Images [arXiv]
  • The hippocampus as a predictive map [biorXiv] [article]
  • Scalable and accurate deep learning for electronic health records [arXiv]

2017-09

  • Variational Memory Addressing in Generative Models [arXiv]
  • Overcoming Exploration in Reinforcement Learning with Demonstrations [arXiv]
  • A Hybrid DSP/Deep Learning Approach to Real-Time Full-Band Speech Enhancement [arXiv] [article] [code]
  • ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases [CVF] [article] [dataset]
  • NIMA: Neural Image Assessment [arXiv] [article]
  • Generating Sentences by Editing Prototypes [arXiv] [code]
  • The Consciousness Prior [arXiv]
  • StarSpace: Embed All The Things! [arXiv] [code]
  • Neural Optimizer Search with Reinforcement Learning [arXiv]
  • Dynamic Evaluation of Neural Sequence Models [arXiv]
  • Neural Machine Translation [arXiv]
  • Matterport3D: Learning from RGB-D Data in Indoor Environments [arXiv] [article] [article2] [code]
  • Deep Reinforcement Learning that Matters [arXiv] [code]
  • The Uncertainty Bellman Equation and Exploration [arXiv]
  • WESPE: Weakly Supervised Photo Enhancer for Digital Cameras [arXiv] [article]
  • Globally Normalized Reader [arXiv] [article] [code]
  • A Brief Introduction to Machine Learning for Engineers [arXiv]
  • Learning with Opponent-Learning Awareness [arXiv] [article]
  • A Deep Reinforcement Learning Chatbot [arXiv]
  • Squeeze-and-Excitation Networks [arXiv]
  • Efficient Methods and Hardware for Deep Learning (Thesis) [Stanford Digital Repository]

2017-08

  • Design and Analysis of the NIPS 2016 Review Process [arXiv]
  • Fast Automated Analysis of Strong Gravitational Lenses with Convolutional Neural Networks [arXiv] [article]
  • TensorFlow Agents: Efficient Batched Reinforcement Learning in TensorFlow [white paper] [code]
  • Automated Crowdturfing Attacks and Defenses in Online Review Systems [arXiv]
  • Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning [arXiv] [article] [code]
  • Deep Learning for Video Game Playing [arXiv]
  • Deep & Cross Network for Ad Click Predictions [arXiv]
  • Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms [arXiv] [code]
  • Multi-task Self-Supervised Visual Learning [arXiv]
  • Learning a Multi-View Stereo Machine [arXiv] [article] [code] (soon)
  • Twin Networks: Using the Future as a Regularizer [arXiv]
  • A Brief Survey of Deep Reinforcement Learning [arXiv]
  • Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation [arXiv] [code]
  • On the Effectiveness of Visible Watermarks [CVPR] [article]
  • Practical Network Blocks Design with Q-Learning [arXiv]
  • On Ensuring that Intelligent Machines Are Well-Behaved [arXiv]
  • Reproducibility of Benchmarked Deep Reinforcement Learning Tasks for Continuous Control [arXiv] [code]
  • Training Deep AutoEncoders for Collaborative Filtering [arXiv] [code]
  • Learning to Perform a Perched Landing on the GroundUsing Deep Reinforcement Learning [nature]
  • Revisiting the Effectiveness of Off-the-shelf Temporal Modeling Approaches for Large-scale Video Classification [arXiv] [article]
  • Intrinsically Motivated Goal Exploration Processes with Automatic Curriculum Learning [arXiv]
  • Neural Expectation Maximization [arXiv] [code]
  • Google Vizier: A Service for Black-Box Optimization [Research at Google]
  • STARDATA: A StarCraft AI Research Dataset [arXiv] [code]
  • Using millions of emoji occurrences to learn any-domain representations for detecting sentiment, emotion and sarcasm [arXiv] [code] [article]
  • Natural Language Processing with Small Feed-Forward Networks [arXiv]

2017-07

  • Photographic Image Synthesis with Cascaded Refinement Networks [arXiv] [code]
  • StarCraft II: A New Challenge for Reinforcement Learning [DeepMind Documents] [code] [article]
  • Leveraging Demonstrations for Deep Reinforcement Learning on Robotics Problems with Sparse Rewards [arXiv]
  • Reinforcement Learning with Deep Energy-Based Policies [arXiv] [article] [code]
  • DARLA: Improving Zero-Shot Transfer in Reinforcement Learning [arXiv]
  • Synthesizing Robust Adversarial Examples [arXiv] [article] [code] (Soon)
  • Voice Synthesis for in-the-Wild Speakers via a Phonological Loop [arXiv] [code] [article]
  • Eyemotion: Classifying facial expressions in VR using eye-tracking cameras [arXiv] [article]
  • A Distributional Perspective on Reinforcement Learning [arXiv] [article] [video]
  • On the State of the Art of Evaluation in Neural Language Models [arXiv]
  • Optimizing the Latent Space of Generative Networks [arXiv]
  • Neuroscience-Inspired Artificial Intelligence [Neuron] [article]
  • Learning Transferable Architectures for Scalable Image Recognition [arXiv]
  • Reverse Curriculum Generation for Reinforcement Learning [arXiv]
  • Imagination-Augmented Agents for Deep Reinforcement Learning [arXiv] [article]
  • Learning model-based planning from scratch [arXiv] [article]
  • Proximal Policy Optimization Algorithms [AWSS3] [code]
  • Automatic Recognition of Deceptive Facial Expressions of Emotion [arXiv]
  • Distral: Robust Multitask Reinforcement Learning [arXiv]
  • Creatism: A deep-learning photographer capable of creating professional work [arXiv] [article]
  • SCAN: Learning Abstract Hierarchical Compositional Visual Concepts [arXiv] [article]
  • Revisiting Unreasonable Effectiveness of Data in Deep Learning Era [arXiv] [article]
  • The Intentional Unintentional Agent: Learning to Solve Many Continuous Control Tasks Simultaneously [arXiv]
  • Deep Bilateral Learning for Real-Time Image Enhancement [arXiv] [code] [article]
  • Emergence of Locomotion Behaviours in Rich Environments [arXiv] [article]
  • Learning human behaviors from motion capture by adversarial imitation [arXiv] [article]
  • Robust Imitation of Diverse Behaviors [arXiv] [article]
  • Hindsight Experience Replay [arXiv]
  • Cardiologist-Level Arrhythmia Detection with Convolutional Neural Networks [arXiv] [article]
  • End-to-End Learning of Semantic Grasping [arXiv]
  • ELF: An Extensive, Lightweight and Flexible Research Platform for Real-time Strategy Games [arXiv] [code] [article]

2017-06

  • Noisy Networks for Exploration [arXiv]
  • Do GANs actually learn the distribution? An empirical study [arXiv]
  • Gradient Episodic Memory for Continuum Learning [arXiv]
  • Natural Language Does Not Emerge 'Naturally' in Multi-Agent Dialog [arXiv] [code]
  • Deep Interest Network for Click-Through Rate Prediction [arXiv]
  • Cognitive Psychology for Deep Neural Networks: A Shape Bias Case Study [arXiv] [article]
  • Structure Learning in Motor Control: A Deep Reinforcement Learning Model [arXiv]
  • Programmable Agents [arXiv]
  • Grounded Language Learning in a Simulated 3D World [arXiv]
  • Schema Networks: Zero-shot Transfer with a Generative Causal Model of Intuitive Physics [arXiv]
  • SVCCA: Singular Vector Canonical Correlation Analysis for Deep Learning Dynamics and Interpretability [arXiv] [article] [code]
  • One Model To Learn Them All [arXiv] [code] [article]
  • Hybrid Reward Architecture for Reinforcement Learning [arXiv]
  • Expected Policy Gradients [arXiv]
  • Variational Approaches for Auto-Encoding Generative Adversarial Networks [arXiv]
  • Deal or No Deal? End-to-End Learning for Negotiation Dialogues [S3AWS] [code] [article]
  • Attention Is All You Need [arXiv] [code] [article]
  • Sobolev Training for Neural Networks [arXiv]
  • YellowFin and the Art of Momentum Tuning [arXiv] [code] [article]
  • Forward Thinking: Building and Training Neural Networks One Layer at a Time [arXiv]
  • Depthwise Separable Convolutions for Neural Machine Translation [arXiv] [code]
  • Parameter Space Noise for Exploration [arXiv] [code] [article]
  • Deep Reinforcement Learning from human preferences [arXiv] [article]
  • Multi-Agent Actor-Critic for Mixed Cooperative-Competitive Environments [arXiv] [code]
  • Self-Normalizing Neural Networks [arXiv] [code]
  • Accurate, Large Minibatch SGD: Training ImageNet in 1 Hour [arXiv]
  • A simple neural network module for relational reasoning [arXiv] [article]
  • Visual Interaction Networks [arXiv] [article]

2017-05

  • Supervised Learning of Universal Sentence Representations from Natural Language Inference Data [arXiv] [code]
  • pix2code: Generating Code from a Graphical User Interface Screenshot [arXiv] [article] [code]
  • The Cramer Distance as a Solution to Biased Wasserstein Gradients [arXiv]
  • Reinforcement Learning with a Corrupted Reward Channel [arXiv]
  • Dilated Residual Networks [arXiv] [code]
  • Bayesian GAN [arXiv] [code]
  • Gradient Descent Can Take Exponential Time to Escape Saddle Points [arXiv] [article]
  • Thinking Fast and Slow with Deep Learning and Tree Search [arXiv]
  • ParlAI: A Dialog Research Software Platform [arXiv] [code] [article]
  • Semantically Decomposing the Latent Spaces of Generative Adversarial Networks [arXiv] [article]
  • Look, Listen and Learn [arXiv]
  • Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset [arXiv] [code]
  • Convolutional Sequence to Sequence Learning [arXiv] [code] [code2] [article]
  • The Kinetics Human Action Video Dataset [arXiv] [article]
  • Safe and Nested Subgame Solving for Imperfect-Information Games [arXiv]
  • Discrete Sequential Prediction of Continuous Actions for Deep RL [arXiv]
  • Metacontrol for Adaptive Imagination-Based Optimization [arXiv]
  • Efficient Parallel Methods for Deep Reinforcement Learning [arXiv]
  • Real-Time Adaptive Image Compression [arXiv]

2017-04

  • General Video Game AI: Learning from Screen Capture [arXiv]
  • Learning to Skim Text [arXiv]
  • Get To The Point: Summarization with Pointer-Generator Networks [arXiv] [code] [article]
  • Adversarial Neural Machine Translation [arXiv]
  • Deep Q-learning from Demonstrations [arXiv]
  • Learning from Demonstrations for Real World Reinforcement Learning [arXiv]
  • DSLR-Quality Photos on Mobile Devices with Deep Convolutional Networks [arXiv] [article] [code]
  • A Neural Representation of Sketch Drawings [arXiv] [code] [article]
  • Automated Curriculum Learning for Neural Networks [arXiv]
  • Hierarchical Surface Prediction for 3D Object Reconstruction [arXiv] [article]
  • Neural Message Passing for Quantum Chemistry [arXiv]
  • Learning to Generate Reviews and Discovering Sentiment [arXiv] [code]
  • Best Practices for Applying Deep Learning to Novel Applications [arXiv]

2017-03

  • Improved Training of Wasserstein GANs [arXiv]
  • Evolution Strategies as a Scalable Alternative to Reinforcement Learning [arXiv]
  • Controllable Text Generation [arXiv]
  • Neural Episodic Control [arXiv]
  • A Structured Self-attentive Sentence Embedding [arXiv]
  • Multi-step Reinforcement Learning: A Unifying Algorithm [arXiv]
  • Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human EEG [arXiv]
  • FaSTrack: a Modular Framework for Fast and Guaranteed Safe Motion Planning [arXiv] [article] [article2]
  • Massive Exploration of Neural Machine Translation Architectures [arXiv] [code]
  • Large Pose 3D Face Reconstruction from a Single Image via Direct Volumetric CNN Regression [arXiv] [article] [code]
  • Minimax Regret Bounds for Reinforcement Learning [arXiv]
  • Sharp Minima Can Generalize For Deep Nets [arXiv]
  • Parallel Multiscale Autoregressive Density Estimation [arXiv]
  • Neural Machine Translation and Sequence-to-sequence Models: A Tutorial [arXiv]
  • Large-Scale Evolution of Image Classifiers [arXiv]
  • FeUdal Networks for Hierarchical Reinforcement Learning [arXiv]
  • Evolving Deep Neural Networks [arXiv]
  • How to Escape Saddle Points Efficiently [arXiv] [article]
  • Opening the Black Box of Deep Neural Networks via Information [arXiv] [video]
  • Understanding Synthetic Gradients and Decoupled Neural Interfaces [arXiv]
  • Learning to Optimize Neural Nets [arXiv] [article]

2017-02

  • The Shattered Gradients Problem: If resnets are the answer, then what is the question? [arXiv]
  • Neural Map: Structured Memory for Deep Reinforcement Learning [arXiv]
  • Bridging the Gap Between Value and Policy Based Reinforcement Learning [arXiv]
  • Deep Voice: Real-time Neural Text-to-Speech [arXiv]
  • Beating the World's Best at Super Smash Bros. with Deep Reinforcement Learning [arXiv]
  • The Game Imitation: Deep Supervised Convolutional Networks for Quick Video Game AI [arXiv]
  • Learning to Parse and Translate Improves Neural Machine Translation [arXiv]
  • All-but-the-Top: Simple and Effective Postprocessing for Word Representations [arXiv]
  • Deep Learning with Dynamic Computation Graphs [arXiv]
  • Skip Connections as Effective Symmetry-Breaking [arXiv]
  • odelSemi-Supervised QA with Generative Domain-Adaptive Nets [arXiv]

2017-01

  • Wasserstein GAN [arXiv]
  • Deep Reinforcement Learning: An Overview [arXiv]
  • DyNet: The Dynamic Neural Network Toolkit [arXiv]
  • DeepStack: Expert-Level Artificial Intelligence in No-Limit Poker [arXiv]
  • NIPS 2016 Tutorial: Generative Adversarial Networks [arXiv]

2016-12

  • A recurrent neural network without Chaos [arXiv]
  • Language Modeling with Gated Convolutional Networks [arXiv]
  • EnhanceNet: Single Image Super-Resolution Through Automated Texture Synthesis [arXiv] [article]
  • Learning from Simulated and Unsupervised Images through Adversarial Training [arXiv]
  • How Grammatical is Character-level Neural Machine Translation? Assessing MT Quality with Contrastive Translation Pairs [arXiv]
  • Improving Neural Language Models with a Continuous Cache [arXiv]
  • DeepMind Lab [arXiv] [code]
  • Deep Learning of Robotic Tasks without a Simulator using Strong and Weak Human Supervision [arXiv]
  • Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning [arXiv]
  • Overcoming catastrophic forgetting in neural networks [arXiv]

2016-11 (ICLR Edition)

  • Image-to-Image Translation with Conditional Adversarial Networks [arXiv]
  • Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer [OpenReview]
  • Learning to reinforcement learn [arXiv]
  • A Way out of the Odyssey: Analyzing and Combining Recent Insights for LSTMs [arXiv]
  • Adversarial Training Methods for Semi-Supervised Text Classification [arXiv]
  • Importance Sampling with Unequal Support [arXiv]
  • Quasi-Recurrent Neural Networks [arXiv]
  • Capacity and Learnability in Recurrent Neural Networks [OpenReview]
  • Unrolled Generative Adversarial Networks [OpenReview]
  • Deep Information Propagation [OpenReview]
  • Structured Attention Networks [OpenReview]
  • Incremental Sequence Learning [arXiv]
  • Delving into Transferable Adversarial Examples and Black-box Attacks [arXiv] [code]
  • b-GAN: Unified Framework of Generative Adversarial Networks [OpenReview]
  • A Joint Many-Task Model: Growing a Neural Network for Multiple NLP Tasks [OpenReview]
  • Categorical Reparameterization with Gumbel-Softmax [arXiv]
  • Lip Reading Sentences in the Wild [arXiv]

Reinforcement Learning:

-Learning to reinforcement learn [arXiv]

  • A Connection between Generative Adversarial Networks, Inverse Reinforcement Learning, and Energy-Based Models [arXiv]
  • The Predictron: End-To-End Learning and Planning [OpenReview]
  • Third-Person Imitation Learning [OpenReview]
  • Generalizing Skills with Semi-Supervised Reinforcement Learning [OpenReview]
  • Sample Efficient Actor-Critic with Experience Replay [OpenReview]
  • Reinforcement Learning with Unsupervised Auxiliary Tasks [arXiv]
  • Neural Architecture Search with Reinforcement Learning [OpenReview]
  • Towards Information-Seeking Agents [OpenReview]
  • Multi-Agent Cooperation and the Emergence of (Natural) Language [OpenReview]
  • Improving Policy Gradient by Exploring Under-appreciated Rewards [OpenReview]
  • Stochastic Neural Networks for Hierarchical Reinforcement Learning [OpenReview]
  • Tuning Recurrent Neural Networks with Reinforcement Learning [OpenReview]
  • RL^2: Fast Reinforcement Learning via Slow Reinforcement Learning [arXiv]
  • Learning Invariant Feature Spaces to Transfer Skills with Reinforcement Learning [OpenReview]
  • Learning to Perform Physics Experiments via Deep Reinforcement Learning [OpenReview]
  • Reinforcement Learning through Asynchronous Advantage Actor-Critic on a GPU [OpenReview]
  • Learning to Compose Words into Sentences with Reinforcement Learning[OpenReview]
  • Deep Reinforcement Learning for Accelerating the Convergence Rate [OpenReview]
  • #Exploration: A Study of Count-Based Exploration for Deep Reinforcement Learning [arXiv]
  • Learning to Compose Words into Sentences with Reinforcement Learning [OpenReview]
  • Learning to Navigate in Complex Environments [arXiv]
  • Unsupervised Perceptual Rewards for Imitation Learning [OpenReview]
  • Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic [OpenReview]

Machine Translation & Dialog

  • Google's Multilingual Neural Machine Translation System: Enabling Zero-Shot Translation [arXiv]
  • Neural Machine Translation with Reconstruction [arXiv]
  • Iterative Refinement for Machine Translation [OpenReview]
  • A Convolutional Encoder Model for Neural Machine Translation [arXiv]
  • Improving Neural Language Models with a Continuous Cache [OpenReview]
  • Vocabulary Selection Strategies for Neural Machine Translation [OpenReview]
  • Towards an automatic Turing test: Learning to evaluate dialogue responses [OpenReview]
  • Dialogue Learning With Human-in-the-Loop [OpenReview]
  • Batch Policy Gradient Methods for Improving Neural Conversation Models [OpenReview]
  • Learning through Dialogue Interactions [OpenReview]
  • Dual Learning for Machine Translation [arXiv]
  • Unsupervised Pretraining for Sequence to Sequence Learning [arXiv]

2016-10

  • Hybrid computing using a neural network with dynamic external memory [nature] [code]
  • Quantum Machine Learning [arXiv]
  • Understanding deep learning requires rethinking generalization [arXiv]
  • Universal adversarial perturbations [arXiv] [code]
  • Neural Machine Translation in Linear Time [arXiv] [code]
  • Professor Forcing: A New Algorithm for Training Recurrent Networks [arXiv]
  • Learning to Protect Communications with Adversarial Neural Cryptography [arXiv]
  • Can Active Memory Replace Attention? [arXiv]
  • Using Fast Weights to Attend to the Recent Past [arXiv]
  • Fully Character-Level Neural Machine Translation without Explicit Segmentation [arXiv]
  • Diverse Beam Search: Decoding Diverse Solutions from Neural Sequence Models [arXiv]
  • Video Pixel Networks [arXiv]
  • Connecting Generative Adversarial Networks and Actor-Critic Methods [arXiv]
  • Learning to Translate in Real-time with Neural Machine Translation [arXiv]
  • Xception: Deep Learning with Depthwise Separable Convolutions [arXiv]
  • Collective Robot Reinforcement Learning with Distributed Asynchronous Guided Policy Search [arXiv]
  • Pointer Sentinel Mixture Models [arXiv]

2016-09

  • Towards Deep Symbolic Reinforcement Learning [arXiv]
  • HyperNetworks [arXiv]
  • Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation [arXiv]
  • Safe and Efficient Off-Policy Reinforcement Learning [arXiv]
  • Playing FPS Games with Deep Reinforcement Learning [arXiv]
  • SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient [arXiv]
  • Episodic Exploration for Deep Deterministic Policies: An Application to StarCraft Micromanagement Tasks [arXiv]
  • Energy-based Generative Adversarial Network [arXiv]
  • Stealing Machine Learning Models via Prediction APIs [arXiv]
  • Semi-Supervised Classification with Graph Convolutional Networks [arXiv]
  • WaveNet: A Generative Model For Raw Audio [arXiv]
  • Hierarchical Multiscale Recurrent Neural Networks [arXiv]
  • End-to-End Reinforcement Learning of Dialogue Agents for Information Access [arXiv]
  • Deep Neural Networks for YouTube Recommendations [paper]

2016-08

  • Semantics derived automatically from language corpora contain human-like biases [arXiv]
  • Why does deep and cheap learning work so well? [arXiv]
  • Machine Comprehension Using Match-LSTM and Answer Pointer [arXiv]
  • Stacked Approximated Regression Machine: A Simple Deep Learning Approach [arXiv]
  • Decoupled Neural Interfaces using Synthetic Gradients [arXiv]
  • WikiReading: A Novel Large-scale Language Understanding Task over Wikipedia [arXiv]
  • Temporal Attention Model for Neural Machine Translation [arXiv]
  • Residual Networks of Residual Networks: Multilevel Residual Networks [arXiv]
  • Learning Online Alignments with Continuous Rewards Policy Gradient [arXiv]

2016-07

  • An Actor-Critic Algorithm for Sequence Prediction [arXiv]
  • Cognitive Science in the era of Artificial Intelligence: A roadmap for reverse-engineering the infant language-learner [arXiv]
  • Recurrent Neural Machine Translation [arXiv]
  • MS-Celeb-1M: A Dataset and Benchmark for Large-Scale Face Recognition [arXiv]
  • Layer Normalization [arXiv]
  • Neural Machine Translation with Recurrent Attention Modeling [arXiv]
  • Neural Semantic Encoders [arXiv]
  • Attention-over-Attention Neural Networks for Reading Comprehension [arXiv]
  • sk_p: a neural program corrector for MOOCs [arXiv]
  • Recurrent Highway Networks [arXiv]
  • Bag of Tricks for Efficient Text Classification [arXiv]
  • Context-Dependent Word Representation for Neural Machine Translation [arXiv]
  • Dynamic Neural Turing Machine with Soft and Hard Addressing Schemes [arXiv]

2016-06

  • Sequence-to-Sequence Learning as Beam-Search Optimization [arXiv]
  • Sequence-Level Knowledge Distillation [arXiv]
  • Policy Networks with Two-Stage Training for Dialogue Systems [arXiv]
  • Towards an integration of deep learning and neuroscience [arXiv]
  • On Multiplicative Integration with Recurrent Neural Networks [arxiv]
  • Wide & Deep Learning for Recommender Systems [arXiv]
  • Online and Offline Handwritten Chinese Character Recognition [arXiv]
  • Tutorial on Variational Autoencoders [arXiv]
  • Concrete Problems in AI Safety [arXiv]
  • Deep Reinforcement Learning Discovers Internal Models [arXiv]
  • SQuAD: 100,000+ Questions for Machine Comprehension of Text [arXiv]
  • Conditional Image Generation with PixelCNN Decoders [arXiv]
  • Model-Free Episodic Control [arXiv]
  • Progressive Neural Networks [arXiv]
  • Improved Techniques for Training GANs [arXiv] [code]
  • Memory-Efficient Backpropagation Through Time [arXiv]
  • InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets [arXiv]
  • Zero-Resource Translation with Multi-Lingual Neural Machine Translation [arXiv]
  • Key-Value Memory Networks for Directly Reading Documents [arXiv]
  • Deep Recurrent Models with Fast-Forward Connections for Neural Machine Translatin [arXiv]
  • Learning to learn by gradient descent by gradient descent [arXiv]
  • Learning Language Games through Interaction [arXiv]
  • Zoneout: Regularizing RNNs by Randomly Preserving Hidden Activations [arXiv]
  • Smart Reply: Automated Response Suggestion for Email [arXiv]
  • Virtual Adversarial Training for Semi-Supervised Text Classification [arXiv]
  • Deep Reinforcement Learning for Dialogue Generation [arXiv]
  • Very Deep Convolutional Networks for Natural Language Processing [arXiv]
  • Neural Net Models for Open-Domain Discourse Coherence [arXiv]
  • Neural Architectures for Fine-grained Entity Type Classification [arXiv]
  • Matching Networks for One Shot Learning [arXiv]
  • Cooperative Inverse Reinforcement Learning [arXiv] [article]
  • Gated-Attention Readers for Text Comprehension [arXiv]
  • End-to-end LSTM-based dialog control optimized with supervised and reinforcement learning [arXiv]
  • Iterative Alternating Neural Attention for Machine Reading [arXiv]
  • Memory-enhanced Decoder for Neural Machine Translation [arXiv]
  • Multiresolution Recurrent Neural Networks: An Application to Dialogue Response Generation [arXiv]
  • Learning to Optimize [arXiv] [article]
  • Natural Language Comprehension with the EpiReader [arXiv]
  • Conversational Contextual Cues: The Case of Personalization and History for Response Ranking [arXiv]
  • Adversarially Learned Inference [arXiv]
  • OpenAI Gym [arXiv] [code]
  • Neural Network Translation Models for Grammatical Error Correction [arXiv]

2016-05

  • Hierarchical Memory Networks [arXiv]
  • Deep API Learning [arXiv]
  • Wide Residual Networks [arXiv]
  • TensorFlow: A system for large-scale machine learning [arXiv]
  • Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention [arXiv]
  • Aspect Level Sentiment Classification with Deep Memory Network [arXiv]
  • FractalNet: Ultra-Deep Neural Networks without Residuals [arXiv]
  • Learning End-to-End Goal-Oriented Dialog [arXiv]
  • One-shot Learning with Memory-Augmented Neural Networks [arXiv]
  • Deep Learning without Poor Local Minima [arXiv]
  • AVEC 2016 - Depression, Mood, and Emotion Recognition Workshop and Challenge [arXiv]
  • Data Programming: Creating Large Training Sets, Quickly [arXiv]
  • Deeply-Fused Nets [arXiv]
  • Deep Portfolio Theory [arXiv]
  • Unsupervised Learning for Physical Interaction through Video Prediction [arXiv]
  • Movie Description [arXiv]

2016-04

  • Higher Order Recurrent Neural Networks [arXiv]
  • Joint Line Segmentation and Transcription for End-to-End Handwritten Paragraph Recognition [arXiv]
  • Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation [arXiv]
  • The IBM 2016 English Conversational Telephone Speech Recognition System [arXiv]
  • Dialog-based Language Learning [arXiv]
  • Multilingual Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Models and Auxiliary Loss [arXiv]
  • Sentence-Level Grammatical Error Identification as Sequence-to-Sequence Correction [arXiv]
  • A Network-based End-to-End Trainable Task-oriented Dialogue System [arXiv]
  • Visual Storytelling [arXiv]
  • Improving the Robustness of Deep Neural Networks via Stability Training [arXiv]
  • Bridging the Gaps Between Residual Learning, Recurrent Neural Networks and Visual Cortex [arXiv]
  • Scan, Attend and Read: End-to-End Handwritten Paragraph Recognition with MDLSTM Attention [arXiv]
  • Sentence Level Recurrent Topic Model: Letting Topics Speak for Themselves [arXiv]
  • Achieving Open Vocabulary Neural Machine Translation with Hybrid Word-Character Models [arXiv]
  • Building Machines That Learn and Think Like People [arXiv]
  • A Semisupervised Approach for Language Identification based on Ladder Networks [arXiv]
  • Deep Networks with Stochastic Depth [arXiv]
  • PHOCNet: A Deep Convolutional Neural Network for Word Spotting in Handwritten Documents [arXiv]

2016-03

  • Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning [arXiv]
  • A Fast Unified Model for Parsing and Sentence Understanding [arXiv]
  • Latent Predictor Networks for Code Generation [arXiv]
  • Attend, Infer, Repeat: Fast Scene Understanding with Generative Models [arXiv]
  • Recurrent Batch Normalization [arXiv]
  • Neural Language Correction with Character-Based Attention [arXiv]
  • Incorporating Copying Mechanism in Sequence-to-Sequence Learning [arXiv]
  • How NOT To Evaluate Your Dialogue System [arXiv]
  • Adaptive Computation Time for Recurrent Neural Networks [arXiv]
  • A guide to convolution arithmetic for deep learning [arXiv]
  • Colorful Image Colorization [arXiv]
  • Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles [arXiv]
  • Generating Factoid Questions With Recurrent Neural Networks: The 30M Factoid Question-Answer Corpus [arXiv]
  • A Persona-Based Neural Conversation Model [arXiv]
  • A Character-level Decoder without Explicit Segmentation for Neural Machine Translation [arXiv]
  • Multi-Task Cross-Lingual Sequence Tagging from Scratch [arXiv]
  • Neural Variational Inference for Text Processing [arXiv]
  • Recurrent Dropout without Memory Loss [arXiv]
  • One-Shot Generalization in Deep Generative Models [arXiv]
  • Recursive Recurrent Nets with Attention Modeling for OCR in the Wild [[arXiv](Recursive Recurrent Nets with Attention Modeling for OCR in the Wild)]
  • A New Method to Visualize Deep Neural Networks [[arXiv](A New Method to Visualize Deep Neural Networks)]
  • Neural Architectures for Named Entity Recognition [arXiv]
  • End-to-end Sequence Labeling via Bi-directional LSTM-CNNs-CRF [arXiv]
  • Character-based Neural Machine Translation [arXiv]
  • Learning Word Segmentation Representations to Improve Named Entity Recognition for Chinese Social Media [arXiv]

2016-02

  • Architectural Complexity Measures of Recurrent Neural Networks [arXiv]
  • Weight Normalization: A Simple Reparameterization to Accelerate Training of Deep Neural Networks [arXiv]
  • Recurrent Neural Network Grammars [arXiv]
  • Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations [arXiv]
  • Contextual LSTM (CLSTM) models for Large scale NLP tasks [arXiv]
  • Sequence-to-Sequence RNNs for Text Summarization [arXiv]
  • Extraction of Salient Sentences from Labelled Documents [arXiv]
  • Learning Distributed Representations of Sentences from Unlabelled Data [arXiv]
  • Benefits of depth in neural networks [arXiv]
  • Associative Long Short-Term Memory [arXiv]
  • Why Should I Trust You?": Explaining the Predictions of Any Classifier [arXiv] [code]
  • Generating images with recurrent adversarial networks [arXiv]
  • Exploring the Limits of Language Modeling [arXiv]
  • Swivel: Improving Embeddings by Noticing What’s Missing [arXiv]
  • WebNav: A New Large-Scale Task for Natural Language based Sequential Decision Making [arXiv]
  • Efficient Character-level Document Classification by Combining Convolution and Recurrent Layers [arXiv]
  • Gradient Descent Converges to Minimizers [arXiv] [article]
  • BinaryNet: Training Deep Neural Networks with Weights and Activations Constrained to +1 or -1 [arXiv]
  • Learning Discriminative Features via Label Consistent Neural Network [arXiv]

2016-01

  • What’s your ML test score? A rubric for ML production systems [Research at Google]
  • Pixel Recurrent Neural Networks [arXiv]
  • Bitwise Neural Networks [arXiv]
  • Long Short-Term Memory-Networks for Machine Reading [arXiv]
  • Coverage-based Neural Machine Translation [arXiv]
  • Understanding Deep Convolutional Networks [arXiv]
  • Training Recurrent Neural Networks by Diffusion [arXiv]
  • Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures [arXiv]
  • Multi-Way, Multilingual Neural Machine Translation with a Shared Attention Mechanism [arXiv]
  • Recurrent Memory Network for Language Modeling [arXiv]
  • Language to Logical Form with Neural Attention [arXiv]
  • Learning to Compose Neural Networks for Question Answering [arXiv]
  • The Inevitability of Probability: Probabilistic Inference in Generic Neural Networks Trained with Non-Probabilistic Feedback [arXiv]
  • COCO-Text: Dataset and Benchmark for Text Detection and Recognition in Natural Images [arXiv]
  • Survey on the attention based RNN model and its applications in computer vision [arXiv]

2015-12

NLP

  • Strategies for Training Large Vocabulary Neural Language Models [arXiv]
  • Multilingual Language Processing From Bytes [arXiv]
  • Learning Document Embeddings by Predicting N-grams for Sentiment Classification of Long Movie Reviews [arXiv]
  • Target-Dependent Sentiment Classification with Long Short Term Memory [arXiv]
  • Reading Text in the Wild with Convolutional Neural Networks [arXiv]

Vision

  • Deep Residual Learning for Image Recognition [arXiv]
  • Rethinking the Inception Architecture for Computer Vision [arXiv]
  • Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks [arXiv]
  • Deep Speech 2: End-to-End Speech Recognition in English and Mandarin [arXiv]

2015-11

NLP

  • Deep Reinforcement Learning with a Natural Language Action Space [arXiv]
  • Sequence Level Training with Recurrent Neural Networks [arXiv]
  • Teaching Machines to Read and Comprehend [arxiv]
  • Semi-supervised Sequence Learning [arXiv]
  • Multi-task Sequence to Sequence Learning [arXiv]
  • Alternative structures for character-level RNNs [arXiv]
  • Larger-Context Language Modeling [arXiv]
  • A Unified Tagging Solution: Bidirectional LSTM Recurrent Neural Network with Word Embedding [arXiv]
  • Towards Universal Paraphrastic Sentence Embeddings [arXiv]
  • BlackOut: Speeding up Recurrent Neural Network Language Models With Very Large Vocabularies [arXiv]
  • Sequence Level Training with Recurrent Neural Networks [arXiv]
  • Natural Language Understanding with Distributed Representation [arXiv]
  • sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings [arXiv]
  • LSTM-based Deep Learning Models for non-factoid answer selection [arXiv]

Programs

  • Neural Random-Access Machines [arxiv]
  • Neural Programmer: Inducing Latent Programs with Gradient Descent [arXiv]
  • Neural Programmer-Interpreters [arXiv]
  • Learning Simple Algorithms from Examples [arXiv]
  • Neural GPUs Learn Algorithms [arXiv] [code]
  • On Learning to Think: Algorithmic Information Theory for Novel Combinations of Reinforcement Learning Controllers and Recurrent Neural World Models [arXiv]

Vision

  • ReSeg: A Recurrent Neural Network for Object Segmentation [arXiv]
  • Deconstructing the Ladder Network Architecture [arXiv]
  • Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks [arXiv]
  • Multi-Scale Context Aggregation by Dilated Convolutions [arXiv] [code]

General

  • Towards Principled Unsupervised Learning [arXiv]
  • Dynamic Capacity Networks [arXiv]
  • Generating Sentences from a `ous Space [arXiv]
  • Net2Net: Accelerating Learning via Knowledge Transfer [arXiv]
  • A Roadmap towards Machine Intelligence [arXiv]
  • Session-based Recommendations with Recurrent Neural Networks [arXiv]
  • Regularizing RNNs by Stabilizing Activations [arXiv]

2015-10

  • A Sensitivity Analysis of (and Practitioners' Guide to) Convolutional Neural Networks for Sentence Classification [arXiv]
  • Attention with Intention for a Neural Network Conversation Model [arXiv]
  • Part-of-Speech Tagging with Bidirectional Long Short-Term Memory Recurrent Neural Network [arXiv]
  • A Survey: Time Travel in Deep Learning Space: An Introduction to Deep Learning Models and How Deep Learning Models Evolved from the Initial Ideas [arXiv]
  • A Primer on Neural Network Models for Natural Language Processing [arXiv]
  • A Diversity-Promoting Objective Function for Neural Conversation Models [arXiv]

2015-09

  • Character-level Convolutional Networks for Text Classification [arXiv]
  • A Neural Attention Model for Abstractive Sentence Summarization [arXiv]
  • Poker-CNN: A Pattern Learning Strategy for Making Draws and Bets in Poker Games [arXiv]

2015-08

  • Neural Machine Translation of Rare Words with Subword Units [arXiv] [code]
  • Listen, Attend and Spell [arxiv]
  • Character-Aware Neural Language Models [arXiv]
  • Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs [arXiv]
  • Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation [arXiv]
  • Effective Approaches to Attention-based Neural Machine Translation [arXiv]

2015-07

  • Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models [arXiv]
  • Semi-Supervised Learning with Ladder Networks [arXiv]
  • Document Embedding with Paragraph Vectors [arXiv]
  • Training Very Deep Networks [arXiv]

2015-06

  • Dropout as a Bayesian Approximation: Representing Model Uncertainty in Deep Learning [arXiv]
  • A Neural Network Approach to Context-Sensitive Generation of Conversational Responses [arXiv]
  • Document Embedding with Paragraph Vectors [arXiv]
  • A Neural Conversational Model [arXiv]
  • Skip-Thought Vectors [arXiv]
  • Pointer Networks [arXiv]
  • Spatial Transformer Networks [arXiv]
  • Tree-structured composition in neural networks without tree-structured architectures [arXiv]
  • Visualizing and Understanding Neural Models in NLP [arXiv]
  • Learning to Transduce with Unbounded Memory [arXiv]
  • Ask Me Anything: Dynamic Memory Networks for Natural Language Processing [arXiv]
  • Deep Knowledge Tracing [arXiv]

2015-05

  • ReNet: A Recurrent Neural Network Based Alternative to Convolutional Networks [arXiv]
  • Reinforcement Learning Neural Turing Machines [arXiv]

2015-04

  • Correlational Neural Networks [arXiv]

2015-03

  • Distilling the Knowledge in a Neural Network [arXiv]
  • End-To-End Memory Networks [arXiv]
  • Neural Responding Machine for Short-Text Conversation [arXiv]
  • Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift [arXiv]
  • Escaping From Saddle Points --- Online Stochastic Gradient for Tensor Decomposition [arXiv] [[article](Escaping from Saddle Points)]

2015-02

  • Human-level control through deep reinforcement learning [Nature] [code]
  • Text Understanding from Scratch [arXiv]
  • Show, Attend and Tell: Neural Image Caption Generation with Visual Attention [arXiv]

2015-01

  • Hidden Technical Debt in Machine Learning Systems [NIPS]

2014-12

  • Learning Longer Memory in Recurrent Neural Networks [arXiv]
  • Neural Turing Machines [arxiv]
  • Grammar as a Foreign Langauage [arXiv]
  • On Using Very Large Target Vocabulary for Neural Machine Translation [arXiv]
  • Effective Use of Word Order for Text Categorization with Convolutional Neural Networks [arXiv]
  • Multiple Object Recognition with Visual Attention [arXiv]

2014-11

  • The Loss Surfaces of Multilayer Networks [arXiv]

2014-10

  • Learning to Execute [arXiv]

2014-09

  • Sequence to Sequence Learning with Neural Networks [arXiv]
  • Neural Machine Translation by Jointly Learning to Align and Translate [arxiv]
  • On the Properties of Neural Machine Translation: Encoder-Decoder Approaches [arXiv]
  • Recurrent Neural Network Regularization [arXiv]
  • Very Deep Convolutional Networks for Large-Scale Image Recognition [arXiv]
  • Going Deeper with Convolutions [arXiv]

2014-08

  • Convolutional Neural Networks for Sentence Classification [arxiv]

2014-07

2014-06

  • Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation [arXiv]
  • Recurrent Models of Visual Attention [arXiv]
  • Generative Adversarial Networks [arXiv]

2014-05

  • Distributed Representations of Sentences and Documents [arXiv]

2014-04

  • A Convolutional Neural Network for Modelling Sentences [arXiv]

2014-03

2014-02

2014-01

2013

  • Visualizing and Understanding Convolutional Networks [arXiv]
  • DeViSE: A Deep Visual-Semantic Embedding Model [pub]
  • Maxout Networks [arXiv]
  • Exploiting Similarities among Languages for Machine Translation [arXiv]
  • Efficient Estimation of Word Representations in Vector Space [arXiv]

2011

  • Natural Language Processing (almost) from Scratch [arXiv]