Donghyun Kim issues

Results 102 issues of


                                            Donghyun Kim

[20] Patches Are All You Need? (ConvMixer)

ICLR22 submission paper. 아이디어는 간단하다. 아직, accept 된건 아니지만, paper가 재미있어서 가져왔다. ViT가 잘되는 건, transformer 가 좋아서일까, patch input이 좋아서일까? 저자들은 patch 로 input 넣는 구조 자체가 좋아서라고 말하며,...

BackBone

[21] Focal Self-attention for Local-Global Interactions in Vision Transformers (Focal Transformer)

야심차게 내놓았지만, 사실 swin 과 성능이 비슷? 해서 뜨지는 않은 논문. [paper](https://arxiv.org/pdf/2107.00641.pdf) [code](https://github.com/microsoft/Focal-Transformer) DeiT 의 layer 별 heatmap 을 보면 점점 local context에서 넓게 넓혀가는 것을 볼 수 있다. 이런...

Microsoft

NeurIPS21

[19] ResNet strikes back: An improved training procedure in timm

[paper](https://arxiv.org/pdf/2110.00476.pdf) timm pacakge 의 아버지 Ross Wightman 의 논문. (Individual Researcher. 간지 철철) training schedule 을 바꿔줌으로써, 224x224 input으로 vanilla ResNet-50에 80.4% top-1 을 찍어버림 (원래가 76.5 인가 그럼) 실험...

BackBone

FAIR

[18] Rethinking Semantic Segmentation from a Sequence-to-Sequence Perspective with Transformers (SETR)

[paper](https://arxiv.org/pdf/2012.15840.pdf) [code](https://github.com/fudan-zvg/SETR) Transformer 로 segmentation 하자! ## SETR ![image](https://user-images.githubusercontent.com/16400591/139812888-6e9e07df-4119-4b4c-ad2a-ac35f836384e.png) 단순하다. 480x480x3 input, 16x16 grid로 나눔. 즉 각 patch 는 (30, 30, 3) 의 크기이다. 요 녀석을 Linear projection 시키고 encoder...

CVPR21

Semantic Segmentation

[17] Faster Improvement Rate Population Based Training (FIRE PBT)

기존 PBT 개선논문 PBT 1저자가 교신저자로 참여. (엔비디아에서 딥마인드로 이직했네?) ## Abstract PBT 는 2가지 한계점이 있었음. 1. 느리다 2. long-term 성능은 오히려 hand-designed 보다 떨어질 때도 많다. Fire PBT...

DeepMind

ParameterSearch

[13] Multi-Task Deep Neural Networks for Natural Language Understanding (MT-DNN)

[paper](https://arxiv.org/pdf/1901.11504.pdf) [code](https://github.com/namisan/mt-dnn) [youtube - TMax AI](https://www.youtube.com/watch?v=rG8A1VFSENM) [blog - 한국어 리뷰](https://y-rok.github.io/nlp/2019/05/20/mt-dnn.html) ## MT-DNN 여러 개의 task 를 multi-task learning 시켜서 BERT 보다 뛰어난 성능 만들기. ![image](https://user-images.githubusercontent.com/16400591/137855673-d31759dc-a624-4439-b6bb-7e002953f020.png) #### Classification ![image](https://user-images.githubusercontent.com/16400591/137879757-323ef1ce-cfb8-41e0-b668-f2bb4e9742a5.png) CLS token...

Microsoft

MultiTask

ACL19

[11] Zero-Shot Text-to-Image Generation (DALL-E)

유명하고 유명한 DALL-E dVAE 의 경우 #37 에서 path encoder로도 사용되었음. Text 를 갖고 auto-regressive 하게 이미지를 생성함. [paper](https://arxiv.org/pdf/2102.12092.pdf) [official code](https://github.com/openai/DALL-E) [non-official code - with training](https://github.com/lucidrains/DALLE-pytorch) [official blog](https://openai.com/blog/dall-e) [yannic kilcher]()...

OpenAI

zero-shot

AutoEncoder

[1] End-to-End Object Detection with Transformers (DETR)

유명한 DETR. Transformer를 이용해 좋은 성능의 detector를 만들자. ## Contribution - NMS 와 Anchor generation 과 같은 hand-designed component를 없앰. - Transformer를 활용한 유의미한 detection 결과. - Bipartite matching -...

ECCV20

FAIR

DETR

[4] Fast Convergence of DETR with Spatially Modulated Co-Attention (SMCA)

[paper](https://arxiv.org/abs/2108.02404) [code](https://github.com/gaopengcuhk/SMCA-DETR) DETR의 빠른 convergence 를 위해 SMCA (Spatially Modulated Co-Attention) 제안 108epoch 에서 coco17 val 45.6 AP 를 끌어냈는데 deformable DETR 이 44.9 를 끌어냈던 것을 생각하면 SMCA 좋은...

DETR

ICCV21

[9] Learning Transferable Visual Models From Natural Language Supervision (CLIP)

CLIP !! ImageNet zero-shot으로 resnet50 성능 끌어냄 [paper](https://arxiv.org/pdf/2103.00020.pdf) [code](https://github.com/OpenAI/CLIP) [한국어-blog](https://simonezz.tistory.com/88) [yannic-youtube](https://www.youtube.com/watch?v=T9XSU0pKX2E) ## CLIP ![image](https://user-images.githubusercontent.com/16400591/137414580-3dd38a31-15be-46fc-9a89-3f4248fb6233.png) #### 1) contrastive pretraining 구조는 단순함. 4억개의 image-text 데이터를 모으고, Image Encoder와 Text Encoder 를 통과시켜...

ICML21

OpenAI

Pretraining

zero-shot