Videos Publications Collectioin

Humans are born to see, and to adapt to this visual world. After the visual signal stimulates the neurons, we learn concepts. we associate one thing with another, seeing waterfall we think about the galaxy, we imagine, and we finally create, updating this visual world. And some of us are trying to gift this ability to intelligent agent, leading an unprecedented scientific trend.

This is a collection of video publications I have recently read, including Action Recognition, Video Generation, Video Self-supervised Learning and some classical papers, etc..

This repo will keep updating during my research.

Video Generation

DVDGAN

SV2P

SAVP

SVG-LP

Vid2Vid

Seg2Vid

TGAN

Generating Videos with Scene Dynamics

Generating the Futures with Adversarial Transformers

Video Disentanglement

MoCoGAN

TwoStreamVAN

RecycleGAN

Deep Visual Analogy-Making

Unsupervised Learning of Disentangled Representations from Video
Future Prediction

Hierarchical Long-term Video Prediction without Supervision

Compositional Video Prediction

An Uncertain Future: Forecasting from Static Images using Variational Autoencoders

Visual Dynamics: Probabilistic Future Frame Synthesis via Cross Convolutional Networks

Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics

Video Self-supervised Learning

Learning Correspondence from the Cycle-consistency of Time

Learning and Using the Arrow of Time

Self-supervised Learning for Video Correspondence Flow

Temporal Cycle-Consistency Learning

Tracking Emerges by Colorizing Videos

Video Representation Learning by Dense Predictive Coding

Shuffle and Learn

Odd-One-Out

Action Recognition & Representation Learning

Two-Stream Fusion Network

Delving Deeper into Convolutional Networks for Learning Video Representations

Architecture

Spatio-temporal Video Autoencoder with Differentiable Memory

Temporal Consistency

Blind Video Temporal Consistency via Deep Video Prior

Blind video temporal consistency

Learning blind video temporal consistency

Occlusion-aware video temporal consistency

Video Inpainting

Copy-and-Paste

Deep Video Inpainting

Deep Flow-Guided Video Inpaiting

Onion-Peel Network

Free-Form Video Inpaiting with 3D Gated Convolution and Temporal PatchGAN

Learnable Gated Temporal Shift Module for Video Inpaiting

Video Inpaiting by Jointly Learning Temporal Structure and Spatial Details

Deep Blind Video Decaptioning by Temporal Aggregation and Recurrence

Learning Joint Spatial-Temporal Transformations for Video Inpainting

Spatio-Temporal Reasoning

Temporal Relational Reasoning in Videos

Videos as Space-Time Region Graphs

Structural-RNN: Deep Learning on Spatio-Temporal Graphs

Relational Action Forecasting

Learning Human-Object Interactions by Graph Parsing Neural Networks

Optical Flow

FlowNet

PWC-Net

MirrorFlow

UnFLow

SfM

SfM-Net

Unsupervised Learning of Depth and Ego-Motion from Video

Video Interpolation

Super SloMo

All at Once: Temporally Adaptive Multi-Frame Interpolation with Advanced Motion Modeling

Deep Slow Motion Video Reconstruction with Hybrid Imaging System

MEMC-Net

Depth-Aware Video Frame Interpolation

Temporal Coherence

Slow and Steady Feature Analysis: Higher Order Temporal Coherence in Video

Learning Blind Video Temporal Consistency

Multi-modalities

Aligning Books and Movies

SoundNet

The Sound of Pixels

The Sound of Motions

Learning to Learn Words from Visual Scenes

VideoBERT

Self-supervised Moving Vehicle Tracking with Stereo Sound

Music Gesture for Visual Sound Separation

Self-supervised Audio-visual Co-segmentation

Labelling Unlabelled Videos from Scratch With Multi-modal Self-supervision

Listen to Look: Action Recognition by Previewing Audio

Cooperative Learning of Audio and Video Models from Self-Supervised Synchronization

Self-Supervised Learning by Cross-Modal Audio-Video Clustering

Sound2Sight: Generating Visual Dynamics from Sound and Context

Multimodal Speech Separation

Looking to Listen at the Cocktail Party

Blind Audio-Visual Source Separation based on Sparse Redundant Representations

Audio-Visual Speech Enhancement Using Multimodal Deep Convolutional Neural Networks

Video Object Segmentation

Zero-Shot Video Object Segmentation via Attentive Graph Neural Networks

Visual Dialog & Visual Question Answering

Reasoning Visual Dialogs with Structural and Partial Observations

Key-Point & Skeleton

Convolutional Sequence Generation for Skeleton-Based Action Synthesis

Unsupervised Keypoint Learning for Guiding Class-Conditional Video Prediction

Classic

Video Textures

Others

What Makes a Video a Video

Videos-Publications-Collection
Videos-Publications-Collection copied to clipboard

Metadata

Videos Publications Collectioin

Video Generation

Video Self-supervised Learning

Action Recognition & Representation Learning

Architecture

Temporal Consistency

Video Inpainting

Spatio-Temporal Reasoning

Optical Flow

SfM

Video Interpolation

Temporal Coherence

Multi-modalities

Video Object Segmentation

Visual Dialog & Visual Question Answering

Key-Point & Skeleton

Classic

Others

← Metadata

Owner

Metadata

Videos-Publications-Collection Videos-Publications-Collection copied to clipboard

Metadata

Videos Publications Collectioin

Video Generation

Video Self-supervised Learning

Action Recognition & Representation Learning

Architecture

Temporal Consistency

Video Inpainting

Spatio-Temporal Reasoning

Optical Flow

SfM

Video Interpolation

Temporal Coherence

Multi-modalities

Video Object Segmentation

Visual Dialog & Visual Question Answering

Key-Point & Skeleton

Classic

Others

← Metadata

Owner

Metadata

Videos-Publications-Collection
Videos-Publications-Collection copied to clipboard