Donghyun Kim issues

Results 102 issues of


                                            Donghyun Kim

[28] Emerging Properties in Self-Supervised Vision Transformers (DINO)

[paper](https://arxiv.org/pdf/2104.14294.pdf) [code](https://github.com/facebookresearch/dino) [yannic youtube](https://www.youtube.com/watch?v=h3ij3F3cPIk&t=925s) self-supervised 잘 해서 [CLS] token의 last-layer self-attention 뽑아보면 unsupervised segmentation 이 되더라. ![image](https://user-images.githubusercontent.com/16400591/141235560-49359b75-78f9-41ab-a77a-2aefd388e018.png) BYOL 과 유사하지만, 새로운 방식의 self-supervised learning 기법 제안 # DINO 방법은 정말정말...

FAIR

ICCV21

Self-Supervised

[27] SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers

[paper](https://arxiv.org/pdf/2105.15203.pdf) [code](https://github.com/NVlabs/SegFormer) 1. positional encoding 이 필요없고, multi-scale output을 내는 encoder 제안 2. MLP decoder + (local attention, global attention) 을 잘 조합 # SegFormer ![image](https://user-images.githubusercontent.com/16400591/141227667-ffd8f8da-63da-427a-9320-14c1b125ff0a.png) (H×W×3) 이 들어왔을 때,...

Semantic Segmentation

NeurIPS21

NVIDIA

[25] Efficiently Modeling Long Sequences with Structured State Spaces (S4)

[paper](https://arxiv.org/pdf/2111.00396.pdf) [code](https://github.com/HazyResearch/state-spaces/blob/main/src/models/sequence/ss/s4.py) openreview 에 S3 라고 제출되어 있는데, 코드도 그렇고, arxiv 에서도 그렇고, S4 가 밀고 있는 name으로 보인다. #51 LSSL 은 state space model (SSM) 을 simulating 하느라 느리고,...

Stanford

State Space Model

[24] Combining Recurrent, Convolutional, and Continuous-time Models with Linear State-Space Layers (LSSL)

[paper](https://arxiv.org/pdf/2110.13985.pdf) [code](https://github.com/hazyresearch/state-spaces) ## Abstract 새로운 Sequence 모델 제안 !! ( time-series 데이터에 유리한 구조 제안! ) Linear State-Space Layers (LSSL) 은 sequence mapping 를 간단하게 수행한다. linear continuous-time state-space representation...

NeurIPS21

Stanford

State Space Model

[Linear Algebra] State Space Representation

# State Space Representation 관련 논문은 나오는데 배경지식이 1도 없어서 정리를 해본다. State Space Model 이란, 1차 미분 방정식 형태의 꼴로 state variable, input, output의 set을 나타낸 것. 시간에 따라...

[23] Long-Short Transformer: Efficient Transformers for Language and Vision

[paper](https://arxiv.org/pdf/2107.02192.pdf) [code](https://github.com/NVIDIA/transformer-ls) long-range attention 과 short attention 을 각각 보자는 논문 ## INTRO - 다양한 efficient attention 나왔지만 language, vision 둘 다 좋은 성능을 내는 방법들이 없었음. - pattern을 박는...

NeurIPS21

Light Attention

NVIDIA

[22] Perceiver IO: A General Architecture for Structured Inputs & Outputs

Perceiver 의 output 을 arbitrary size 로 만들어서 다양한 task 를 하나의 framework로 해결하자. ![image](https://user-images.githubusercontent.com/16400591/140241122-bd120e6c-5a4c-48b0-810a-10707503ce85.png) [paper](https://arxiv.org/abs/2107.14795) [code](https://github.com/deepmind/deepmind-research/tree/master/perceiver) [youtube review - The AI Epiphany](https://www.youtube.com/watch?v=WJWBq4NZfvY&t=53s) ## RECAP [perceiver](https://arxiv.org/pdf/2103.03206.pdf) 부터 잠깐 recap! ![image](https://user-images.githubusercontent.com/16400591/140237926-510a4fd6-e36f-4855-86ea-73b2f976f47e.png)...

DeepMind

Light Attention

MultiModal

[15] An Empirical Study on Few-shot Knowledge Probing for Pretrained Language Models (TREx-2p data)

조경현 교수님 ...! [paper](https://arxiv.org/pdf/2109.02772.pdf) [code](https://github.com/cloudygoose/fewshot_lama) 기존 LAMA dataset 은 **1-hop (subject, relation, object)** 성능만을 예측했다. [LAMA dataset paper review blog - KR](https://jeonsworld.github.io/NLP/lama/) 아래 이미지는 LAMA 의 예시다. LM 도 과연...

Dataset

MIT

NYU

few-shot

Prompt

[2] Deformable DETR: Deformable Transformers for End-to-End Object Detection

백신 공가 이틀 사용 후 복귀. (21.10.05 - 21.10.06) 포스트가 잘 이해가지 않는다면 #28 을 먼저 볼 것. ## Deformable DETR [paper](https://arxiv.org/pdf/2010.04159.pdf) [code](https://github.com/fundamentalvision/Deformable-DETR) DETR 이 hand-designed feature를 없애고 좋은 성능을...

Oral

DETR

ICLR21

SenseTime

Light Attention

[16] MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding

LayoutLM 저자들의 다음 행선지. [paper](https://arxiv.org/pdf/2110.08518.pdf) [code](https://github.com/microsoft/unilm/tree/master/markuplm) ![스크린샷, 2021-10-25 11-16-26](https://user-images.githubusercontent.com/16400591/138629119-4ef461fa-a6c2-419c-b199-5bec0500ab91.png) ## XPath Embedding html 페이지를 갖고 pretraining 을 하기 위해서 xpath 를 이용한다. [xpath 블로그](https://loco-motive.tistory.com/26) 의 `1.3.3 predicates` 부분을 보면 이해에...

MSRA

Pretraining

Document Understand