Donghyun Kim issues

Results 102 issues of


                                            Donghyun Kim

[39] Language Modelling at Scale: Gopher, Ethical considerations, and Retrieval

[blog](https://deepmind.com/research/publications/2021/scaling-language-models-methods-analysis-insights-from-training-gopher) [paper](https://kstatic.googleusercontent.com/files/b068c6c0e64d6f933068f7de30ea722359ef87c6c14d3065856b86d44fbdf2dea3ff373ed9eb751514f242d20df9d6a468622fad093f962563545e7d0cdb9dba) 280B model gopher 관련 논문 - section 3 - Curation of MassiveText - Architecture - Optimization - Infrastructure - section 4 - 152 개의 set에서 benchmark - section...

DeepMind

LLM

[42] Panoptic SegFormer: Delving Deeper into Panoptic Segmentation with Transformers

[paper](https://arxiv.org/pdf/2109.03814.pdf) [code](https://github.com/zhiqi-li/Panoptic-SegFormer) - Pyramid Vision Transformer(PVT) v2 백본 (#69) - multiscale feature를 deformable attention encoder로 결합 - location decoder로 각 instance의 위치 정보를 찾아낸 다음 이 정보를 mask decoder에 쿼리로...

Panoptic Segmentation

[41] PVTv2: Improved Baselines with Pyramid Vision Transformer

PVT (#68) 의 후속작. 6장짜리 간단한 논문 [paper](https://arxiv.org/pdf/2106.13797.pdf) [code](https://github.com/whai362/PVT) 3가지 추가. 1. overlapping patch embedding 2. convolutional feedforward networks 3. linear complexity attention layers 빠르게 본론 ㄱㄱ # Improved Pyramid...

BackBone

[40] Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions

[paper](https://arxiv.org/abs/2102.12122) [code](https://github.com/whai362/PVT) ![image](https://user-images.githubusercontent.com/16400591/146115550-8d8e63af-6341-4ba1-ba5c-a1a9b6bd06e1.png) TF-E == Transoformer Encoder ## Contributions 1. PVT 제안. 다양한 vision task 에 적용 가능 2. multi-scale, high resolution feature 를 위해 progressive shrinking pyramid, spatial-reduction attention...

BackBone

ICCV21

[38] Object-Aware Cropping for Self-Supervised Learning

[paper](https://arxiv.org/pdf/2112.00319v1.pdf) [code](https://github.com/shlokk/object-cropping-ssl) SSL 할 때, random crop 똑똑하게 해서 성능향상 꾀하자. ImageNet 같이 curation 잘 된 데이터랑 다르게 openimage 나 coco 같은 set은 정중앙에 object가 없는 경우도 많다. 단순 random...

Pretraining

Contrastive Learning

[36] ViDT: An Efficient and Effective Fully Transformer-based Object Detector

[paper](https://arxiv.org/pdf/2110.03921.pdf) [code](https://github.com/naver-ai/vidt) 가성비 detector를 만들자! # INTRO ![image](https://user-images.githubusercontent.com/16400591/144395578-bab3b278-ca61-43a4-ab3a-af7cc110474a.png) 그림에서 볼 수 있듯이, small size 에서 잘 동작하는 detector를 제안하는 논문. DETR (ViT) 는 2가지 한계점을 갖고 있는데, 1. ViT 는...

Detection

DETR

[33] Dynamic DETR: End-to-End Object Detection with Dynamic Attention

microsoft 에서 밀고 있는 DETR. Dynamic DETR. [paper](https://openaccess.thecvf.com/content/ICCV2021/papers/Dai_Dynamic_DETR_End-to-End_Object_Detection_With_Dynamic_Attention_ICCV_2021_paper.pdf) ![image](https://user-images.githubusercontent.com/16400591/143860229-ba384656-0cd9-4eb6-8830-10715b416f9d.png) # Dynamic DETR overview 이다. 아래 설명을 읽다가 overview 그림을 한 번씩 다시 봐주면 이해가 편할 것이다. ![image](https://user-images.githubusercontent.com/16400591/143862056-7231ecdc-ff0a-440c-bc20-f3c0d4fb01e5.png) ## Revisit DETR...

ICCV21

Microsoft

[32] Florence: A New Foundation Model for Computer Vision

[paper](https://arxiv.org/abs/2111.11432) vision-language contrastive pretraining 이후 downstream task에 transfer. 온갖 벤치마크에서 sota를 찍음 # Abstract 3가지를 강조한다 1. from coarse (scene) to fine (object) 2. from static (images) to dynamic (videos)...

Microsoft

Pretraining

[31] Swin Transformer V2: Scaling Up Capacity and Resolution

[paper](https://arxiv.org/pdf/2111.09883.pdf) [code](https://github.com/microsoft/Swin-Transformer) 모델을 3B까지 키움. # Abstract ![image](https://user-images.githubusercontent.com/16400591/142803143-3c85e961-e52e-4beb-ac03-b20fe543b2a7.png) 1. pre norm ==> post norm. (그렇지만 skip add 다음이 아니라 skip add 이전에 ln을 적용한 다음 add하는 방식.) 2. dot prod...

MSRA

[30] Masked Autoencoders Are Scalable Vision Learners (MAE)

[paper](https://arxiv.org/abs/2111.06377) [PR12 진원님 리뷰](https://www.youtube.com/watch?v=mtUa3AAxPNQ) Kaiming He 의 새로운 논문....! # Intro masked autoencoding 이 NLP와 VISION 에서 다른 이유가 뭘까? (BERT는 성공했는데, vision에서는 그닥..?) 이 질문에 대답하기 위해 3가지 관점으로...

FAIR

Pretraining