Yi-Chen (Howard) Lo comments

Results 89 comments of


                                            Yi-Chen (Howard) Lo

trafficstars

Understanding disentangling in β-VAE

### How to Tune Hyperparameters Gamma and C? (Response by Christopher P. Burgess) Gamma sets the strength of the penalty for deviating from the target KL, C. Here you want...

Using Target-side Monolingual Data for Neural Machine Translation through Multi-task Learning

### Summary This paper purposes to train neural machine translation (NMT) model **from scratch** on both bilingual and target-side monolingual data in a multi-task setting by modifying the decoder in...

MaskGAN: Better Text Generation via Filling in the ______

### Summary Neural text generation models are typically auto-regressive, trained via maximizing likelihood or perplexity and evaluate with validation perplexity. They claim such training and evaluation could result in poor...

Auxiliary Objectives for Neural Error Detection Models

### Summary This work can be viewed as a follow-up work on Rei (2017) #3, which porposes a semi-supervised framework for multi-task learning, integrating language modeling as an additional objective,...

Compositional Attention Networks for Machine Reasoning

### Summary This paper purpose a novel recurrent, attention and memory based neural architecture "Memory, Attention and Composition (MAC) cell" for VQA, featuring multi-step reasoning. Three operations in MAC cell:...

Machine Learning that Matters

### TL;DR This papers identifies several key obstacles in machine learning research and gives some inspirations for machine learning research that matters, aiming to address the gap between research and...

A Multilayer Convolutional Encoder-Decoder Neural Network for Grammatical Error Correction

### Summary This paper is the first group to successfully employ a fully convolutional Seq2Seq (Conv Seq2Seq) model (Gehring et al., 2017) on grammatical error correction (GEC) with some improvements:...

Knowledge Distillation Meets Self-Supervision & Prime-Aware Adaptive Distillation

### Prior Approaches on Knowledge Distillation Basically, knowledge distillation aims to obtain a smaller student model from typically larger teacher model by matching their information hidden in the model. The...

Knowledge Distillation Meets Self-Supervision & Prime-Aware Adaptive Distillation

### Metadata: Prime-Aware Adaptive Distillation - Author: Youcai Zhang, Zhonghao Lan, +4, Yichen Wei - Organization: Megvii Inc. & University of Science and Technology of China & Tongji University -...

Knowledge Distillation Meets Self-Supervision & Prime-Aware Adaptive Distillation

### Highlights ![](https://cdn-images-1.medium.com/max/1600/1*yd2GPCLG-Ithrj-glB5R-g.png) * This paper incorporates data uncertainty for re-weighting samples for distillation. * They conjecture that previous hard-mining (i.e., focus more on hard samples instead of easier ones)...