papers issues

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

## 論文概要単眼Depth推定SoTA。入力画像をブロックに分割し各ブロックで完全連結型のConditional Random Fields (CRFs)の最適化を行うことにより計算量を削減し実効性を高めた。KITTI、NYUv2、MatterPort3Dデータセット全てで大幅に性能が改善することを提示。 ![bib_20221004 00](https://user-images.githubusercontent.com/16313809/197314795-74f7f67a-cf54-49d7-9b25-7a9123b8814a.jpg) https://weihaosky.github.io/newcrfs/ ## Code https://github.com/aliyun/NeWCRFs

tkuri

Conference: CVPR

Application: SIDE

Year: 2022

PANDORA: Polarization-Aided Neural Decomposition Of Radiance

## 論文概要偏光カメラによる多視点画像を入力としたNeRF。偏光は表面の法線に強く依存し拡散反射と鏡面反射で挙動が異なるため、NeRFによるインバースレンダリングの手がかりとして有用である事を示唆。従来よりも強いスペキュラを正確にモデル化・再構成・分離することが可能。 ![bib_20220927 00](https://user-images.githubusercontent.com/16313809/197314659-d25238f8-f42b-4fc6-8314-549b56939873.jpg) https://akshatdave.github.io/pandora/ ## Code https://github.com/akshatdave/pandora

tkuri

Subject: Dataset

Conference: ICCV/ECCV

Application: Inverse Rendering

Year: 2022

Subject: NeRF

It's About Time: Analog Clock Reading in the Wild

## 論文概要自然界の画像・映像からアナログ時計の時刻を読み取る。時計の文字盤の外観に様々な種類があること、形状や数字の位置がカメラの視点に影響うける、影や正反射による混乱、等でかなり難しい。合成データセット生成器と検出・認識からなる２段階のフレームワークを提案。 ![bib_20220930 00](https://user-images.githubusercontent.com/16313809/197314738-057313a7-bbb3-4fb4-89d7-47d3d67806e1.jpg) https://charigyang.github.io/abouttime/ ## Code https://github.com/charigyang/itsabouttime

tkuri

Subject: Dataset

Conference: CVPR

Year: 2022

Swin Transformer V2: Scaling Up Capacity and Resolution

## 論文概要 Swin Transformerを30億パラメータまで拡張し1,536×1,536の解像度の画像を学習可能に。様々なベンチマークでSOTA。学習における不安定性を解決するためにモデルを改良(Layer Normの順番、Cosine Attentionの導入等)。更にGPUのメモリ消費量を大幅に削減する実装方法を提案。 ![bib_20220908 00](https://user-images.githubusercontent.com/16313809/197314135-ddb68b82-91c8-4d67-9174-6cf6189fdcf2.jpg) https://openaccess.thecvf.com/content/CVPR2022/html/Liu_Swin_Transformer_V2_Scaling_Up_Capacity_and_Resolution_CVPR_2022_paper.html ## Code https://github.com/microsoft/Swin-Transformer

tkuri

Conference: CVPR

Year: 2022

Subject: Backbone

Hyperbolic Image Segmentation

## 論文概要画像セグメンテーションはユークリッド空間において画素レベルの最適化と推論を行うことが現在の標準であるが、ここでは双曲多様体が画像セグメンテーションのための代替手段を提供することを提示。メリットは信頼度推定や境界情報の自由化、ゼロラベル汎化など。 ![bib_20220907 00](https://user-images.githubusercontent.com/16313809/197313441-4b1ef6d6-5f65-46f5-9f29-521c759ce79c.jpg) https://openaccess.thecvf.com/content/CVPR2022/html/Atigh_Hyperbolic_Image_Segmentation_CVPR_2022_paper.html ## Code https://github.com/MinaGhadimiAtigh/HyperbolicImageSegmentation

tkuri

Conference: CVPR

Year: 2022

Application: Segmentation

ICON: Implicit Clothed Humans Obtained From Normals

## 論文概要撮影画像群から詳細な3次元表面を推定しそれらを組み合わせてアニメーション可能なアバターを生成する。従来手法はグローバルなポーズに敏感なグローバル特徴エンコーダを使用しているため人間の様々な姿勢に対してロバストではない。そこで局所的な特徴を用いる手法を提案。 ![bib_20220906 00](https://user-images.githubusercontent.com/16313809/197312660-4d72483f-4731-40bf-966a-a677e078cefe.jpg) https://icon.is.tue.mpg.de/ ## Code https://github.com/YuliangXiu/ICON

tkuri

Conference: CVPR

Year: 2022

360MonoDepth: High-Resolution 360° Monocular Depth Estimation

## 論文概要高解像度360°画像の単眼Depth推定のための一般的で柔軟なフレームワーク。入力360°画像を正二十面体の面等を用いたPerspective Tangent画像の集合に投影しDepth推定、次にグローバルに整列し勾配領域でブレンドする。 ![bib_20220905 00](https://user-images.githubusercontent.com/16313809/188530978-4775bc31-e491-4f01-a865-41bfc3e64ec9.jpg) https://openaccess.thecvf.com/content/CVPR2022/html/Rey-Area_360MonoDepth_High-Resolution_360deg_Monocular_Depth_Estimation_CVPR_2022_paper.html ## Code & Dataset https://manurare.github.io/360monodepth/

tkuri

Conference: CVPR

Input: Panorama

Application: SIDE

Year: 2022

It’s Time for Artistic Correspondence in Music and Video

## 論文概要ある動画に適した音楽、もしくはある音楽に適した動画を推薦する。各モダリティのTransformerネットワークを用い、映像と音楽の両方の長期的な時間コンテキストをモデル化し、人間の注釈を必要とせずデータから直接この対応関係を学習する自己教師のアプローチを提案。 ![bib_20220902 00](https://user-images.githubusercontent.com/16313809/188529966-2ea0360c-d8e3-4cdb-beb3-1a831b77d782.jpg) https://musicforvideo.cs.columbia.edu/ ## Code 未確認。

tkuri

Conference: CVPR

Year: 2022

The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting

## 論文概要ビデオインペインティングのために特別に設計されたデータセット。7つの最新アルゴリズムの長所と短所を分析。時間とオプティカルフローをモデル化した手法は性能が高いが、相対的な順位は手法そのものだけでなくソースビデオやマスクのコンテンツに非常に敏感であると示唆。 ![bib_20220829 00](https://user-images.githubusercontent.com/16313809/188529454-2f8a0273-208d-4ba6-aaec-977f81c78232.jpg) https://openaccess.thecvf.com/content/CVPR2022/html/Szeto_The_DEVIL_Is_in_the_Details_A_Diagnostic_Evaluation_Benchmark_CVPR_2022_paper.html ## Code & Dataset https://github.com/MichiganCOG/devil

tkuri

Subject: Dataset

Conference: CVPR

Year: 2022

Application: Inpainting

Mobile-Former: Bridging MobileNet and Transformer

## 論文概要 MobileNetとTransformerを並列化しその間に双方向ブリッジを設けたネットワークの提案。双方の利点を組み合わせ、局所的特徴抽出におけるMobileNetの効率と、大域的相互作用のモデル化におけるTransformerのパワーを活用し、軽量ながらMobileNetV3を凌駕する性能を達成。 ![bib_20220822 00](https://user-images.githubusercontent.com/16313809/188529238-70b535c5-d90d-4977-b80e-793ac4156b08.jpg) https://openaccess.thecvf.com/content/CVPR2022/html/Zhang_Wavelet_Knowledge_Distillation_Towards_Efficient_Image-to-Image_Translation_CVPR_2022_paper.html ## Code 未確認。

tkuri

Conference: CVPR

Year: 2022

papers
papers copied to clipboard

Metadata

NeW CRFs: Neural Window Fully-connected CRFs for Monocular Depth Estimation

PANDORA: Polarization-Aided Neural Decomposition Of Radiance

It's About Time: Analog Clock Reading in the Wild

Swin Transformer V2: Scaling Up Capacity and Resolution

Hyperbolic Image Segmentation

ICON: Implicit Clothed Humans Obtained From Normals

360MonoDepth: High-Resolution 360° Monocular Depth Estimation

It’s Time for Artistic Correspondence in Music and Video

The DEVIL Is in the Details: A Diagnostic Evaluation Benchmark for Video Inpainting

Mobile-Former: Bridging MobileNet and Transformer

← Metadata

Owner

Metadata

papers papers copied to clipboard

Metadata

← Metadata

Owner

Metadata

papers
papers copied to clipboard