Yi-Chen (Howard) Lo comments

Results 89 comments of


                                            Yi-Chen (Howard) Lo

Knowledge Distillation Meets Self-Supervision & Prime-Aware Adaptive Distillation

### Further Readings * [Circumventing Outliers of AutoAugment with Knowledge Distillation](https://arxiv.org/abs/2003.11342). Wei et al. ECCV 2020. * [Task-Oriented Feature Distillation](https://papers.nips.cc/paper/2020/file/a96b65a721e561e1e3de768ac819ffbb-Paper.pdf). Zhang et al. NeurIPS 2020. * [Residual Distillation: Towards Portable...

Learning Disentangled Joint Continuous and Discrete Representations

TBC.

Universal Bounding Box Regression and Its Applications

### TL;DR ![](https://github.com/howardyclo/papernotes/blob/master/images/UBBR/1.png?raw=true) - Propose a class-agnostic (transferable to **unseen classes**) and anchor-free box regressor, *Universal Bounding-Box Regressor* (UBBR). - UBBR takes an image and any arbitrary bounding boxes, and...

Adversarial Contrastive Estimation

### Summary This paper proposes to augment the negative sampling process in contrastive learning with an *adversarially* learned conditional distribution, resulting in a negative sampler that adapts to the data...

Adversarial Contrastive Estimation

### Notes on Noise Contrastive Estimation (NCE) #### Idea In neural language modeling, computing the probability normalization term in softmax is expensive. Thus, NCE and its simplified variants, *negative sampling*...

Linguistic Input Features Improve Neural Machine Translation

### Summary Incorporate linguistic features to improve neural machine translation (NMT) performance, such as lemmas, subword tags, morphological features, part-of-speech (POS) tags and syntactic dependency labels. --- ### Hypotheses -...

Super-Convergence: Very Fast Training of Neural Networks Using Large Learning Rates

### Preliminary ![](https://i.imgur.com/OdF058Q.png) See ["Cyclical Learning Rates for Training Neural Networks" by Smith (WACV 2017)](https://arxiv.org/abs/1506.01186). ### TL;DR - Present an unknown phenomenon called "super-convergence": we can train DNNs an order...

Hyperspherical Prototype Networks

![](https://ppt.cc/[email protected]) ### Summary - **Motivation**: Unify classification and regression via hyperspherical output space with class prototypes defined *a priori*. The output number is no longer constrained to fixed output size...

Strategies for Training Large Vocabulary Neural Language Models

### Summary This paper presents a systematic comparison of strategies to represent and train large vocabularies, including classical *softmax*, *hierarchical softmax*, *target sampling*, *noise contrastive estimation* and *self normalization* (*infrequent...

Re-labeling ImageNet: from Single to Multi-Labels, from Global to Localized Labels

### Highlights - **Motivation**. ImageNet label is noisy: An image may contain multiple objects but is annotated with image-level single class label. - **Intuition**. A model trained with the single-label...