ComputerVisionSummarization
ComputerVisionSummarization copied to clipboard
The summary of computer vision
- 1. CVPR Paper
- 1.1. All CVPR2018 Paper
- 1.1.1. Tracking
- 1.1. All CVPR2018 Paper
- 2. Video collection
- 2.1. Link
- 2.2. video rank
- 3. Detection
- 4. Sementation
- 4.1. mask rcnn
- 5. Lip language
- 5.1. Learning Lip Sync from Audio
- 6. Some Interesting
- 6.1. Aging photo prediction
- 6.2. D panorama
- 6.3. PanoCatcher
- 7. Tracking
- 7.1. MOT
- 7.2. Correlation Filter
- 7.3. End-to-end representation learning for correlation filter based tracking
- 7.4. Attentional Correlation Filter Network for Adaptive Visual Tracking
- 7.5. Context-Aware Correlation Filter Tracking
- 8. Action Recognition
- 9. Reconstruction
- 10. Detection
- 11. Sementation
- 12. Action Recognition
- 13. Point Cloud Representation
- 14. Summary
- 15. Reference
1. CVPR Paper
All the paper is available at official website.
The offline list of paper is available at this
1.1. All CVPR2018 Paper
1.1.1. Tracking
| Paper ID | Type | Title |
|---|---|---|
| 122 | Poster | Detect-and-Track: Efficient Pose Estimation in Videos |
| 255 | Poster | Multi-Cue Correlation Filters for Robust Visual Tracking |
| 281 | Spotlight | Tracking Multiple Objects Outside the Line of Sight using Speckle Imaging |
| 281 | Poster | Tracking Multiple Objects Outside the Line of Sight using Speckle Imaging |
| 369 | Oral | Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies |
| 369 | Poster | Total Capture: A 3D Deformation Model for Tracking Faces, Hands, and Bodies |
| 423 | Spotlight | Fast and Accurate Online Video Object Segmentation via Tracking Parts |
| 423 | Poster | Fast and Accurate Online Video Object Segmentation via Tracking Parts |
| 678 | Poster | Learning Attentions: Residual Attentional Siamese Network for High Performance Online Visual Tracking |
| 736 | Spotlight | GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB |
| 736 | Poster | GANerated Hands for Real-Time 3D Hand Tracking from Monocular RGB |
| 890 | Poster | CarFusion: Combining Point Tracking and Part Detection for Dynamic 3D Reconstruction of Vehicles |
| 892 | Poster | Context-aware Deep Feature Compression for High-speed Visual Tracking |
| 1022 | Poster | A Benchmark for Articulated Human Pose Estimation and Tracking |
| 1194 | Poster | Hyperparameter Optimization for Tracking with Continuous Deep Q-Learning |
| 1264 | Poster | End-to-end Flow Correlation Tracking with Spatial-temporal Attention |
| 1280 | Spotlight | VITAL: VIsual Tracking via Adversarial Learning |
| 1280 | Poster | VITAL: VIsual Tracking via Adversarial Learning |
| 1304 | Poster | SINT++: Robust Visual Tracking via Adversarial Hard Positive Generation |
| 1353 | Poster | Learning Spatial-Temporal Regularized Correlation Filters for Visual Tracking |
| 1439 | Poster | Efficient Diverse Ensemble for Discriminative Co-Tracking |
| 1494 | Poster | Correlation Tracking via Joint Discrimination and Reliability Learning |
| 1676 | Spotlight | Learning Spatial-Aware Regressions for Visual Tracking |
| 1676 | Poster | Learning Spatial-Aware Regressions for Visual Tracking |
| 1679 | Poster | Fusing Crowd Density Maps and Visual Object Trackers for People Tracking in Crowd Scenes |
| 1949 | Poster | Rolling Shutter and Radial Distortion are Features for High Frame Rate Multi-camera Tracking |
| 2129 | Poster | High-speed Tracking with Multi-kernel Correlation Filters |
| 2628 | Poster | A Causal And-Or Graph Model for Visibility Fluent Reasoning in Tracking Interacting Objects |
| 2951 | Spotlight | High Performance Visual Tracking with Siamese Region Proposal Network |
| 2951 | Poster | High Performance Visual Tracking with Siamese Region Proposal Network |
| 3013 | Oral | Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net |
| 3013 | Poster | Fast and Furious: Real Time End-to-End 3D Detection, Tracking and Motion Forecasting with a Single Convolutional Net |
| 3292 | Spotlight | MX-LSTM: mixing tracklets and vislets to jointly forecast trajectories and head poses |
| 3292 | Poster | MX-LSTM: mixing tracklets and vislets to jointly forecast trajectories and head poses |
| 3502 | Poster | A Prior-Less Method for Multi-Face Tracking in Unconstrained Videos |
| 3583 | Poster | Towards dense object tracking in a 2D honeybee hive |
| 3817 | Spotlight | Good Appearance Features for Multi-Target Multi-Camera Tracking |
| 3817 | Poster | Good Appearance Features for Multi-Target Multi-Camera Tracking |
| 3980 | Poster | A Twofold Siamese Network for Real-Time Object Tracking |
2. Video collection
2.1. Link
https://pan.baidu.com/s/1eSIVG90
2.2. video rank
- holoportation_ virtual 3D teleportation in real-time (Microsoft Research).mp4
- Realtime Multi-Person 2D Human Pose Estimation using Part Affinity Fields, CVPR 2017 Oral
- Full-Resolution Residual Networks (FRRNs) for Semantic Image Segmentation in Street Scenes
- YOLO v2
- DeepGlint CVPR2016
3. Detection
4. Sementation
4.1. mask rcnn
The mask rcnn is proposed by KaiMing, and implied in github repostory
-
mask rcnn extends Faster R-CNN by adding a branch for predicting an object mask in parallel with the existing branch for bounding box recognition.
-
output:
- a class label
- a bounding-box offset
- object mask
-
It can run at 5 fps and training on COCO takes one to two days on a single 8-GPU machine.
-
It has another application: human pose estimation, instance segementation, bounding-box object detection, and person keypoint detection, camera calibration.
- By viewing each keypoint as a one-hot binary mask, it can estimate human pose.
-
It belongs to the instance segmentation field.
I have the mask rcnn in bus scene.

It performs well. This is all the result
5. Lip language
It's an amazing thing that training lip language recogition
5.1. Learning Lip Sync from Audio
- given the audio of President Barack Obama, we synthesize a high quality video of him speaking with accurate lip sync. see video
6. Some Interesting
6.1. Aging photo prediction
- takes a single photograph of a child as input and automatically produces a series of age-progressed outputs between 1 and 80 years of age, accounting for pose, expression, and illumination. see video
6.2. D panorama
6.3. PanoCatcher
7. Tracking
7.1. MOT
| method name | title | paper | author | rate |
|---|---|---|---|---|
| CDA_DDALv2 | Confidence-Based Data Association and Discriminative Deep Appearance Learning for Robust Online Multi-Object Tracking | TPAMI | MLA Bae, Seung-Hwan, and Kuk-Jin Yoon. | reading now |
| FWT | Fusion of Head and Full-Body Detectors for Multi-Object Tracking | CVPR18 | Roberto Henschel, Laura Leal-Taixe, Daniel Cremers, Bodo Rosenhahn | reading now |
| LMP | Multiple people tracking by lifted multicut and person re-identification | CVPR17 | Tang, Siyu, et al. | reading now |
| NLLMPa | Joint graph decomposition & node labeling: Problem, algorithms, applications. | CVPR17 | Levinkov, Evgeny, et al. | reading now |
| QuadMOT16 | Multi-Object Tracking with Quadruplet Convolutional Neural Networks | CVPR17 | Son, Jeany, et al. | reading now |
| EDMT | Enhancing Detection Model for Multiple Hypothesis Tracking | CVPR17w | Chen, Jiahui, et al. | reading now |
| AMIR | Tracking the untrackable: Learning to track multiple cues with long-term dependencies | ICCV17 | Sadeghian, Amir, Alexandre Alahi, and Silvio Savarese. | reading now |
| STAM16 | Online Multi-Object Tracking Using CNN-based Single Object Tracker with Spatial-Temporal Attention Mechanism. | ICCV17 | Chu, Qi, et al. | reading now |
| LINF1 | Improving Multi-Frame Data Association with Sparse Representations for Robust Near-Online Multi-Object Tracking | ECCV16 | L. Fagot-Bouquet, R. Audigier, Y. Dhome, F. Lerasle | reading now |
| EAMTT | Multi-target tracking with strong and weak detections | ECCV16w | R. Sanchez-Matilla, F. Poiesi, A. Cavallaro | reading now |
| LTTSC-CRF | Long-Term Time-Sensitive Costs for CRF-Based Tracking by Detection | ECCV16w | Le, Nam, Alexander Heili, and Jean-Marc Odobez. | reading now |
Dataset
PETS2009 : An old dataset.
KITTI-Tracking : Multi-person or multi-car tracking dataset.
MOT dataset : A dataset for multi-person detection and tracking, mostly used.
UA-DETRAC : A dataset for multi-car detection and tracking.
AVSS2018 Challenge : AVSS2018 Challenge based on UA-DETRAC is opened!
DukeMTMC : A dataset for multi-camera multi-person tracking.
PoseTrack: A dataset for multi-person pose tracking.
NVIDIA AI CITY Challenge: Challenges including "Traffic Flow Analysis", "Anomaly Detection" and "Multi-sensor Vehicle Detection and Reidentification", you may find some insteresting codes on their Github repos
Vis Drone: Tracking videos captured by drone-mounted cameras.
JTA Dataset: A huge dataset for pedestrian pose estimation and tracking in urban scenarios created by exploiting the highly photorealistic video game Grand Theft Auto V developed by Rockstar North.
Review
P Emami,PM Pardalos,L Elefteriadou,S Ranka "Machine Learning Methods for Solving Assignment Problems in Multi-Target Tracking" [paper]
Wenhan Luo, Junliang Xing, Anton Milan, Xiaoqin Zhang, Wei Liu, Xiaowei Zhao and Tae-Kyun Kim, "Multiple Object Tracking: A Literature Review" [paper]
Evaluation Metric
CLEAR MOT : Bernardin, K. & Stiefelhagen, R. "Evaluating Multiple Object Tracking Performance: The CLEAR MOT Metric" [paper]
IDF1 : Ristani, E., Solera, F., Zou, R., Cucchiara, R. & Tomasi, C. "Performance Measures and a Data Set for Multi-Target, Multi-Camera Tracking" [paper]
Researcher
Anton Milan [webpage and his source code]
Laura Leal-Taixé [webpage and her source code]
Dynamic Vision and Learning Group [webpage and their source code]
Longyin Wen [webpage and his source code]
UCF [webpage]
Some source codes in above webpage are not listed in below Open Source.Such as:
"segTrack"
"Exploiting Hierarchical Dense Structures on Hypergraphs for Multi-Object Tracking"
"Learning an image-based motion context for multiple people tracking"
Open Source
Batch
headTracking: Shun Zhang, Jinjun Wang, Zelun Wang, Yihong Gong,Yuehu Liu: "Multi-Target Tracking by Learning Local-to-Global Trajectory Models" in PR 2015 [paper] [code] seems like a repo.
IOU : E. Bochinski, V. Eiselein, T. Sikora. "High-Speed Tracking-by-Detection Without Using Image Information" [paper] [code] In International Workshop on Traffic and Street Surveillance for Safety and Security at IEEE AVSS 2017, 2017.
NMGC-MOT Andrii Maksai, Xinchao Wang, Franc¸ois Fleuret, and Pascal Fua "Non-Markovian Globally Consistent Multi-Object Tracking
" [paper][code] In ICCV 2017
D2T Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman, "Detect to Track and Track to Detect" [paper] [code] In ICCV 2017
H2T : Longyin Wen, Wenbo Li, Junjie Yan, Zhen Lei, Dong Yi, Stan Z. Li. "Multiple Target Tracking Based on Undirected Hierarchical Relation Hypergraph," [paper] [code] IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2014.
LDCT : F. Solera, S. Calderara, R. Cucchiara "Learning to Divide and Conquer for Online Multi-Target Tracking" [paper] [code page 1] [code page 2] In Proceedings of International Converence on Computer Vision (ICCV), Santiago Cile, Dec 12-18, 2015
CEM : Anton Milan, Stefan Roth, Konrad Schindler "Continuous Energy Minimization for Multi-Target Tracking" [paper] [code] in pami 2014
OPCNF : Chari, Visesh and Lacoste-Julien, Simon and Laptev, Ivan and Sivic, Josef "On Pairwise Costs for Network Flow Multi-Object Tracking" [paper] [code] In CVPR 2015
KSP : J. Berclaz, F. Fleuret, E. Türetken and P. Fua "Multiple Object Tracking using K-Shortest Paths Optimization" [paper] [code] IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011.
GMCP : Amir Roshan Zamir, Afshin Dehghan, and Mubarak Shah "GMCP-Tracker: Global Multi-object Tracking Using Generalized Minimum Clique Graphs" [paper] [code] European Conference on Computer Vision (ECCV), 2012.
Online
MOTDT Long Chen, Haizhou Ai "Real-time Multiple People Tracking with Deeply Learned Candidate Selection and Person Re-identification" in ICME 2018 [code][paper]!
TMPORT : E. Ristani and C. Tomasi. Tracking Multiple People Online and in Real Time. in ACCV 2014 [paper] [code]
MOT-RNN : Anton Milan, Seyed Hamid Rezatofighi, Anthony Dick, Konrad Schindler, Ian Reid "Online Multi-target Tracking using Recurrent Neural Networks"[paper] [code] In AAAI 2017.
DeepSort : Wojke, Nicolai and Bewley, Alex and Paulus, Dietrich "Simple Online and Realtime Tracking with a Deep Association Metric" [paper] [code] In ICIP 2017
Sort : Bewley, Alex and Ge, Zongyuan and Ott, Lionel and Ramos, Fabio and Upcroft, Ben "Simple Online and Realtime Tracking"[paper] [code] In ICIP 2016.
MDP : Yu Xiang, Alexandre Alahi, and Silvio Savarese "Learning to Track: Online Multi-Object Tracking by Decision Making
" [paper] [code] In International Conference on Computer Vision (ICCV), 2015
CMOT : S. H. Bae and K. Yoon. "Robust online multi-object tracking based on tracklet confidence and online discriminative appearance learning" [paper] [code] In CVPR 2014
RCMSS : Mohamed A. Naiel1, M. Omair Ahmad, M.N.S. Swamy, Jongwoo Lim, and Ming-Hsuan Yang "Online Multi-Object Tracking Via
Robust Collaborative Model and Sample Selection"[paper] [code] Computer Vision and Image Understanding 2016
MHT-DAM : Chanho Kim, Fuxin Li, Arridhana Ciptadi, James M. Rehg "Multiple Hypothesis Tracking Revisited"[paper] [code] In ICCV 2015
OMPTTH : Jianming Zhang, Liliana Lo Presti and Stan Sclaroff, "Online Multi-Person Tracking by Tracker Hierarchy," [paper] [code] Proc. Int. Conf. on Advanced Video and Signal Based Surveillance (AVSS), 2012.
SMOT : C. Dicle, O. Camps, M. Sznaier. "The Way They Move: Tracking Targets with Similar Appearance" [paper] [code] In ICCV, 2013.
Private Detection
POI : F. Yu, W. Li, Q. Li, Y. Liu, X. Shi, J. Yan. "POI: Multiple Object Tracking with High Performance Detection and Appearance Feature" [paper] [detection] In BMTT, SenseTime Group Limited, 2016
CVPR2017
Eldar Insafutdinov, Mykhaylo Andriluka, Leonid Pishchulin, Siyu Tang, Evgeny Levinkov, Bjoern Andres, Bernt Schiele "Art Track: Articulated Multi-Person Tracking in the Wild" [paper]
Manmohan Chandraker, Paul Vernaza, Wongun Choi, Samuel Schulter "Deep Network Flow for Multi-Object Tracking" [paper]
Jeany Son, Mooyeol Baek, Minsu Cho, and Bohyung Han, "Multi-Object Tracking with Quadruplet Convolutional Neural Networks" [paper]
ICCV2017
A. Sadeghian, A. Alahi, S. Savarese, Tracking The Untrackable: Learning To Track Multiple Cues with Long-Term Dependencies [paper]
Andrii Maksai, Xinchao Wang, Franc¸ois Fleuret, and Pascal Fua "Non-Markovian Globally Consistent Multi-Object Tracking
" [paper][code]
Christoph Feichtenhofer, Axel Pinz, Andrew Zisserman, "Detect to Track and Track to Detect" [paper] [code]
Qi Chu, Wanli Ouyang, Xiaogang Wang, Bin Liu, Nenghai Yu "Online Multi-Object Tracking Using CNN-Based Single Object Tracker With Spatial-Temporal Attention Mechanism" [paper]
CVPR2018
Ristani and C. Tomasi "Features for Multi-Target Multi-Camera Tracking and Re-Identification" [paper] [code]
New paper
M Fabbri, F Lanzi, S Calderara, A Palazzi "Learning to Detect and Track Visible and Occluded Body Joints in a Virtual World" [paper] [[code]] Waited!
Cong Ma, Changshui Yang, Fan Yang, Yueqing Zhuang, Ziwei Zhang, Huizhu Jia, Xiaodong Xie "Trajectory Factory: Tracklet Cleaving and Re-connection by Deep Siamese Bi-GRU for Multiple Object Tracking" In ICME 2018 [paper]
Kuan Fang, Yu Xiang, Xiaocheng Li and Silvio Savarese "Recurrent Autoregressive Networks for Online Multi-Object Tracking" In IEEE Winter Conference on Applications of Computer Vision (WACV), 2018. [webpage]
Tharindu Fernando, Simon Denman, Sridha Sridharan, Clinton Fookes "Tracking by Prediction: A Deep Generative Model for Mutli-Person localisation and Tracking" In WACV 2018 [paper]
Multi-person Face Tracking
Shun Zhang, Yihong Gong, Jia-Bin Huang, Jongwoo Lim, Jinjun Wang, Narendra Ahuja and Ming-Hsuan Yang "Tracking Persons-of-Interest via Adaptive Discriminative Features" In ECCV 2016 [paper] [code]
Chung-Ching Lin, Ying Hung"A Prior-Less Method for Multi-Face Tracking in Unconstrained Videos" In CVPR 2018 [paper]
Multi-person Pose Tracking
Yuliang Xiu, Jiefeng Li, Haoyu Wang, Yinghong Fang, Cewu Lu "Pose Flow: Efficient Online Pose Tracking" [paper] Idea is interesting but the true source code is not opened.
Bin Xiao, Haiping Wu, and Yichen Wei "Simple Baselines for Human Pose Estimation and Tracking" [paper][code]
7.2. Correlation Filter
7.3. End-to-end representation learning for correlation filter based tracking
It is a tracking method based on deep learning. This author designed a network consisting of correlation filter layer, who solved the backpropagation program
- I have tried this method. But it doesn't work well and have some test failure cases, as following

-
abstract We present a framework that allows the explicit incorporation of global context within CF trackers. We reformulate the original optimization problem and provide a closed form solution for single and multidimensional features in the primal and dual domain.
-
video, paper, matlab code, python code
-
advantage:
- It's an end-to-end tracking method, which can be trained directly.
- It can run in real-time.
-
disadvantage:
- It's will drift with the object occlusion
- It's will scale wrongly with the object enlarge or being small.
-
My opion:
- Tracking should be combined both the object feature itself and the context feature.
7.4. Attentional Correlation Filter Network for Adaptive Visual Tracking
7.5. Context-Aware Correlation Filter Tracking
- Bas
8. Action Recognition
9. Reconstruction
10. Detection
11. Sementation
12. Action Recognition
13. Point Cloud Representation
14. Summary
- Mask RCNN is amazing, but it's not fast enough for real time detection.
- There are lots of computer vision tasks need to be done, and only few tasks are finished. Obejct recognition is the simplest task, which is extremly handled and the rate of recognition is more than that of human beings. But, the majority tasks are still need to be done, such as: action recogition, action predict, 3D object recognition, 3D object representation, 3D action recognition, represention of speak, smell, feel and vision. Machine vision is the kernel task for robot intelligence. So don't worry about nothing to do in this field.
15. Reference
http://www.themtank.org/a-year-in-computer-vision