OCR调研

OCR目前有三大任务，分别为文本识别，文本检测以及端到端End2End识别，三个任务的分布情况如下图。

avatar

近几年OCR任务的解决方案主要围绕深度学习展开，如下图论文数量变化。 avatar

文本检测

论文根据发布时间排列
IC为ICDAR会议
Score是文本定位任务的F1-Score

(L) 代表分数 leader-board
(L) 目的是区分报道的分数和实际分数的不同

*CODE 指提供源码， CODE(M) 指提供训练好的模型

avatar avatar

Conf.	Date	Title	IC13	IC15	Resources
'14-ECCV	14/10/07	[Robust Scene Text Detection with Convolution Neural Network Induced MSER Trees]( http://www.whuang.org/papers/whuang2014_eccv.pdf)
15-CVPR	15/06/01	Symmetry-based text line detection in natural scenes	0.8043		`PRJ` `CODE`
'16-TIP	15/10/12	Text-Attentional Convolutional Neural Networks for Scene Text Detection	0.8165
'15-ICCV	15/12/13	Text Flow : A Unified Text Detection System in Natural Scene Images	0.8025
'16-arXiv	16/03/31	Accurate Text Localization in Natural Image with Cascaded Convolutional TextNetwork	0.86
'16-CVPR	16/04/14	Multi-Oriented Text Detection with Fully Convolutional Networks	0.83	0.54	`*TORCH(M)`
'16-CVPR	16/04/22	Synthetic Data for Text Localisation in Natural Images	0.847 (L)0.8359		`CODE` `DB`
'16-arXiv	16/06/29	Scene Text Detection Via Holistic， Multi-Channel Prediction	0.8433	0.6477
'16-ECCV	16/09/12	Detecting Text in Natural Image with Connectionist Text Proposal Network	0.8215	0.6085	`*CAFFE(M)` `CAFFE` `TF(M)` `TF` `DEMO` `BLOG(CH)`
'17-AAAI	16/11/21	TextBoxes: A fast text detector with a single deep neural network	0.85 (L)0.8767		`*CAFFE(M)` `TF` `BLOG(KR)`
'18-TM	17/03/03	Arbitrary-Oriented Scene Text Detection via Rotation Proposals	0.9125	0.8020	`*CAFFE`
'17-CVPR	17/03/04	Deep Matching Prior Network: Toward Tighter Multi-oriented Text Detection		0.7064
'17-CVPR	17/03/19	Detecting Oriented Text in Natural Images by Linking Segments	0.853	0.75 (L)0.7636	`*TF(M)` `TF(M)` `SLIDE` `VIDEO`
'17-arXiv	17/03/24	Deep Direct Regression for Multi-Oriented Scene Text Detection	0.86	0.81
'17-arXiv	17/04/03	Cascaded Segmentation-Detection Networks for Word-Level Text Spotting	0.86	0.71
'17-CVPR	17/04/11	EAST: An Efficient and Accurate Scene Text Detector		0.8072 (L)0.8038	`TF(M)` `TF` `PYTORCH(M)` `PYTORCH` `DEMO` `KERAS(M)` `VIDEO`
'17-ICIP	17/05/15	WordFence: Text Detection in Natural Images with Border Awareness	0.86
'17-arXiv	17/06/30	R2CNN: Rotational Region CNN for Orientation Robust Scene Text Detection	0.8773	0.8254	`TF(M)` `CAFFE(M)`
'17-CVPR	17/07/21	Multi-scale FCN with Cascaded Instance Aware Segmentation for Arbitrary Oriented Word Spotting In The Wild	0.85	0.63
'17-arXiv	17/08/17	Deep Scene Text Detection with Connected Component Proposals	0.919
'17-ICCV	17/08/22	WordSup: Exploiting Word Annotations for Character based Text Detection	0.9064	0.7816
'17-ICCV	17/09/01	Single Shot Text Detector with Regional Attention	0.8704	0.7691	`*CAFFE(M)` `PYTORCH` `VIDEO`
'17-arXiv	17/09/11	Fused Text Segmentation Networks for Multi-oriented Scene Text Detection		0.8414
'17-ICCV	17/10/13	WeText: Scene Text Detection under Weak Supervision	0.869 (L)0.8313
'17-ICCV	17/10/22	Self-organized Text Detection with Minimal Post-processing via Border Learning	0.84		`*KERAS(M)`
'17-ICDAR	17/11/11	Deep Residual Text Detection Network for Scene Text	0.9117 (L)0.8925
'18-AAAI	17/11/12	Feature Enhancement Network: A Refined Scene Text Detector	0.9161
'17-arXiv	17/11/30	ArbiText: Arbitrary-Oriented Text Detection in Unconstrained Scene		0.759
'18-AAAI	18/01/04	PixelLink: Detecting Scene Text via Instance Segmentation	0.881	0.8519	`*TF(M)` `TF`
'18-CVPR	18/01/05	FOTS: Fast Oriented Text Spotting with a Unified Network	0.925	0.8984	`PYTORCH` `PYTORCH` `VIDEO`
'18-TIP	18/01/09	TextBoxes++: A Single-Shot Oriented Scene Text Detector	0.88	0.829 (L)0.8475	`*CAFFE(M)`
'18-CVPR	18/03/09	An end-to-end TextSpotter with Explicit Alighment and Attention	0.9	0.87	`*CAFFE(M)`
'18-CVPR	18/03/14	Rotation-Sensitive Regression for Oriented Scene Text Detection	0.89	0.838	`*CAFFE(M)`
'18-arXiv	18/04/08	Detecting Multi-Oriented Text with Corner-based Region Proposals	0.876	0.845	`*CAFFE(M)`
'18-arXiv	18/04/24	An Anchor-Free Region Proposal Network for Faster R-CNN based Text Detection Approaches	0.92	0.86
'18-IJCAI	18/05/03	IncepText: A New Inception-Text Module with Deformable PSROI Pooling for Multi-Oriented Scene Text Detection		0.9047
'18-arXiv	18/06/07	Shape Robust Text Detection with Progressive Scale Expansion Network		0.8721	`PRJ`
'18-ECCV	18/07/04	TextSnake: A Flexible Representation for Detecting Text of Arbitrary Shapes		0.826	`PYTORCH`
'18-ECCV	18/07/06	Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes	0.917	0.86
'18-ECCV	18/07/10	Accurate Scene Text Detection through Border Semantics Awareness and Bootstrapping	0.892
'19-AAAI	18/11/21	Scene Text Detection with Supervised Pyramid Context Network	0.921	0.872
'19-TIP	18/12/04	TextField: Learning A Deep Direction Field for Irregular Scene Text Detection		0.824	`*CAFFE(M)`
'19-CVPR	19/03/21	Towards Robust Curve Text Detection with Conditional Spatial Expansion
'19-CVPR	19/03/28	Shape Robust Text Detection with Progressive Scale Expansion Network		0.857	`TF(M)`
'19-CVPR	19/04/03	Character Region Awareness for Text Detection	0.952	0.869	`*PYTORCH(M)` `VIDEO` `PYTORCH` `KERAS` `BLOG_CH` `BLOG_KR` `BLOG_KR` `BLOG_KR`
'19-CVPR	19/04/13	Look More Than Once: An Accurate Detector for Text of Arbitrary Shapes Screen reader support enabled		0.877
'19-CVPR	19/06/16	Learning Shape-Aware Embedding for Scene Text Detection		0.877
'19-CVPR	19/06/16	Arbitrary Shape Scene Text Detection with Adaptive Text Region Representation	0.917	0.876

文本识别

Score是识别任务中的单词准确率

avatar avatar

Conf.	Date	Title	SVT	IIIT5k	IC03	IC13	Resources
'15-ICLR	14/12/18	Deep structured output learning for unconstrained text recognition	0.717		0.896	0.818	`TF` `SLIDE` `VIDEO`
'16-IJCV	15/05/07	Reading text in the wild with convolutional neural networks	0.807		0.933	0.908	`KERAS`
'16-AAAI	15/06/14	Reading Scene Text in Deep Convolutional Sequences
'17-TPAMI	15/07/21	An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition	0.808	0.782	0.894	0.867	`TORCH(M)` `TF` `TF` `TF` `TF` `PYTORCH` `PYTORCH(M)` `BLOG(KR)`
'16-CVPR	16/03/09	Recursive Recurrent Nets with Attention Modeling for OCR in the Wild	0.807	0.784	0.887	0.9
'16-CVPR	16/03/12	Robust scene text recognition with automatic rectification	0.819	0.819	0.901	0.886	`PYTORCH` `PYTORCH`
'16-CVPR	16/06/27	CNN-N-Gram for Handwriting Word Recognition	0.8362				`VIDEO`
'16-BMVC	16/09/19	STAR-Net: A SpaTial Attention Residue Network for Scene Text Recognition	0.836	0.833	0.899	0.891
'17-arXiv	17/07/27	STN-OCR: A single Neural Network for Text Detection and Text Recognition	0.798	0.86		0.903	`*MXNET(M)` `PRJ` `BLOG`
'17-IJCAI	17/08/19	Learning to Read Irregular Text with Attention Mechanisms
'17-arXiv	17/09/06	Scene Text Recognition with Sliding Convolutional Character Models	0.765	0.816	0.845	0.852
'17-ICCV	17/09/07	Focusing Attention: Towards Accurate Text Recognition in Natural Images	0.859	0.874	0.942	0.933
'18-CVPR	17/11/12	AON: Towards Arbitrarily-Oriented Text Recognition	0.828	0.87	0.915		`TF`
'17-NIPS	17/12/04	Gated Recurrent Convolution Neural Network for OCR	0.815	0.808	0.978		`*TORCH(M)`
'18-AAAI	18/01/04	Char-Net: A Character-Aware Neural Network for Distorted Scene Text Recognition	0.844	0.836	0.915	0.908
'18-AAAI	18/01/04	SqueezedText: A Real-time Scene Text Recognition by Binary Convolutional Encoder-decoder Network		0.87	0.931	0.929
'18-CVPR	18/05/09	Edit Probability for Scene Text Recognition	0.875	0.883	0.946	0.944
'18-TPAMI	18/06/25	ASTER: An Attentional Scene Text Recognizer with Flexible Rectification	0.936	0.934	0.945	0.918	`*TF(M)` `PYTORCH`
'18-ECCV	18/09/08	Synthetically Supervised Feature Learning for Scene Text Recognition	0.871	0.894	0.947	0.94
'19-AAAI	18/09/18	Scene Text Recognition from Two-Dimensional Perspective	0.821	0.92		0.914
'19-CVPR	18/12/14	ESIR: End-to-end Scene Text Recognition via Iterative Image Rectification	0.902	0.933		0.913	PRJ
'19-PR	19/01/10	MORAN: A Multi-Object Rectified Attention Network for Scene Text Recognition	0.883	0.912	0.950	0.924	`*PYTORCH(M)`
'19-ICCV	19/04/03	What is wrong with scene text recognition model comparisons? dataset and model analysis	0.875		0.949	0.936	`*PYTORCH(M)` `BLOG_KR`
'19-CVPR	19/04/18	Aggregation Cross-Entropy for Sequence Recognition	0.826	0.823	0.921	0.897	`*PYTORCH`
'19-CVPR	19/06/16	Sequence-to-Sequence Domain Adaptation Network for Robust Text Image Recognition	0.845	0.838	0.921	0.918

端到端文本识别

Score 是一般任务的F1-Score

avatar

Conf.	Date	Title	IC03	IC13	IC15	Resources
'12-ICPR	12/11/11	End-to-end text recognition with convolutional neural networks	0.67			`*CODE`
'14-ECCV	14/09/06	Deep Features for Text Spotting	0.75			`PRJ` `MATLAB`
'15-IJCV	15/05/07	Reading Text in the Wild with Convolutional Neural Networks	0.70	0.77		`KERAS`
'15-TPAMI	15/10/30	Real-time Lexicon-free Scene Text Localization and Recognition		0.542	0.156
'16-arXiv	16/04/10	TextProposals: a Text-specific Selective Search Algorithm for Word Spotting in the Wild		0.6843	0.4718 (L)0.533	`*CAFFE(M)`
'17-AAAI	16/11/21	TextBoxes: A fast text detector with a single deep neural network		0.84		`TF` `*CAFFE(M)` `BLOG_KR`
'17-ICCV	17/07/13	Towards End-to-end Text Spotting with Convolution Recurrent Neural Network		0.8459		`VIDEO`
'17-ICCV	17/10/22	Deep TextSpotter An End-to-End Trainable Scene Text Localization and Recognition Framework		0.77	0.47	`VIDEO` `*CAFFE(M)`
'18-CVPR	18/01/05	FOTS: Fast Oriented Text Spotting with a Unified Network		0.8477	0.6533	`VIDEO` `TF(M)`
'18-TIP	18/01/09	TextBoxes++: A Single-Shot Oriented Scene Text Detector		0.8465	0.519	`*CAFFE(M)`
'18-CVPR	18/03/09	An end-to-end TextSpotter with Explicit Alighment and Attention		0.86	0.63	`*CAFFE(M)`
'18-TPAMI	18/06/25	ASTER: An Attentional Scene Text Recognizer with Flexible Rectification			0.64	`*TF(M)`
'18-ECCV	18/07/06	Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes		0.865	0.624

其他会议和论文

Conf.	Date	Title	Description	Resources
'14-NIPS	14/06/09	Synthetic Data and Artificial Neural Networks for Natural Scene Text Recognition	Dataset	`PRJ`
'17-ECCV	17/02/13	End-to-End Interpretation of the French Street Name Signs Dataset	Dataset (FSNS)	`*TF(M)`
'17-arXiv	17/04/11	Attention-based Extraction of Structured Information from Street View Imagery	FSNS	`*TF(M)` `TF` `TF` `LUA` `BLOG_KR`
'17-CVPR	17/07/21	Unambiguous Text Localization and Retrieval for Cluttered Scenes	Text Retrieval
'17-AAAI	17/10/22	Detection and Recognition of Text Embedded in Online Images via Neural Context Models	Dataset	`PRJ`
'18-CVPR	17/11/17	Separating Style and Content for Generalized Style Transfer	Font Style
'17-arXiv	17/12/06	Detecting Curve Text in the Wild New Dataset and New Solution	Dataset (CTW 1500)	`PRJ`
'18-AAAI	17/12/14	SEE: Towards Semi-Supervised End-to-End Scene Text Recognition	FSNS	`PRJ` `*CHAINER(M)`
'17-CVPR	18/06/07	Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Networks	Document Layout	`PRJ`
'18-CVPR	18/06/19	DocUNet: Document Image Unwarping via A Stacked U-Net	Document Dewarping	`PRJ`
'18-CVPR	18/06/19	Document Enhancement using Visibility Detection	Document Enhancement	`PRJ`
'18-IJCAI	18/06/22	Multi-Task Handwritten Document Layout Analysis	Document Layout
'18-ECCV	18/07/09	Verisimilar Image Synthesis for Accurate Detection and Recognition of Texts in Scenes	Dataset	`PRJ`
'19-AAAI	18/12/03	EnsNet: Ensconce Text in the Wild	Text Removal	`DB`
'19-CVPR	18/12/14	Spatial Fusion GAN for Image Synthesis	Dataset	`DB`
'19-AAAI	19/01/27	Hierarchical Encoder with Auxiliary Supervision for Table-to-text Generation: Learning Better Representation for Tables	TableToText
'19-AAAI	19/01/27	A Radical-aware Attention-based Model for Chinese Text Classification	Chinese Character Classification
'19-AAAI	19/01/27	Hierarchical Encoder with Auxiliary Supervision for Table-to-text Generation: Learning Better Representation for Tables	TableToText
'19-CVPR	19/02/25	Handwriting Recognition in Low-resource Scripts using Adversarial Learning	Handwritting Recognition	`TF`
'19-CVPR	19/03/27	Tightness-aware Evaluation Protocol for Scene Text Detection	Evaluation	`CODE`
'19-CVPR	19/06/16	DynTypo: Example-based Dynamic Text Effects Transfer	Text Effects	`PRJ` `VIDEO`
'19-CVPR	19/06/16	Typography with Decor: Intelligent Text Style Transfer	Text Effects	`*PYTORCH(M)`
'19-CVPR	19/06/16	An Alternative Deep Feature Approach to Line Level Keyword Spotting	Kyeword Spotting

论文和代码

综述

按年份分

2019-present
2015-2018
2011-2014
before-2010

按任务分

overview
text-detection
text-recognition
text-segmentation
end-to-end-ocr
video-ocr
document-image-unwarping

按会议和期刊分

CVPR: IEEE Conference on Computer Vision and Pattern Recognition
NIPS: Neural Information Processing Systems
ECCV: European Conference on Computer Vision
ICCV: International Conference on Computer Vision
ICLR: International Conference on Learning Representations
AAAI: Association for the Advancement of Artificial Intelligence
IJCAI: International Joint Conference on Artificial Intelligence
BMVC: British Machine Vision Conference
ICPR: International Conference on Pattern Recognition
ICDAR: International Conference on Document Analysis and Recognition
TPAMI: IEEE Transactions on Pattern Analysis and Machine Intelligence
IJCV: International Journal of Computer Vision
TIP: IEEE Transactions on Image Processing
TMM: IEEE Transactions on Multimedia

数据集

ICDAR会议基准数据集
自然场景数据集
人工数据集
不规则文本数据集
单词/字符数据集
视频数据集
其他

数据集对比

数据集(年份)	图片数量 (train/test)	文本数量 (train/test)	文字方向	语言	特点	文本检测/识别任务
End2End	====	====	====	====	====	====
ICDAR03 (2003)	509 (258/251)	2276 (1110/1156)	水平	En	-	✓/✓
ICDAR13 Scene Text(2013)	462 (229/233)	- (848/1095)	水平	En	自然场景	✓/✓
ICDAR15 Incidental Text(2015)	1500 (1000/500)	- (-/-)	多方向	En	图片模糊且非刻意拍摄	✓/✓
ICDAR17 / RCTW (2017)	12263 (8034/4229)	- (-/-)	多方向	Cn	手机相机拍摄，手机屏幕截图	✓/✓
CoCo-Text v2.0 (2019)	63686 (-/-)	239506 (-/-)	多方向	En	在线数据集，标注详细	✓/✓
Total-Text (2017)	1555 (1255/300)	11459 (-/-)	多方向，弯曲	En， Cn	文本不规则，使用多边形标注	✓/✓
SVT (2010)	350 (100/250)	904 (257/647)	水平	En	谷歌街景	✓/✓
KAIST (2010)	3000 (-/-)	5000 (-/-)	水平	En， Ko	数据集分类详细	✓/✓
NEOCR (2011)	659 (-/-)	5238 (-/-)	多方向	8 langs	自然场景	✓/✓
CTW (2017)	32K ( 25K/6K)	1M ( 812K/205K)	多方向	Cn	中文街景，图片高清，标注详细	✓/✓
CASIA-10K (2018)	10K (7K/3K)	- (-/-)	多方向	Cn	场景文本检测	✓/✓
SDTL (2016)	自定(-/-)	90k(-/-)	水平	-	在自然场景图片内人工合成文本，提供源码	✓/√
仅文本检测	====	====	====	====	====	====
MSRA-TD500 (2012)	500 (300/200)	1719 (1068/651)	多方向	En， Cn	自然场景	✓/-
ICDAR17 / RRC-MLT (2017)	18000 (9000/9000)	- (-/-)	多方向	9 langs	自然场景	✓/-
SCUT-CTW1500 (2017)	1500 (1000/500)	- (-/-)	多方向，弯曲	En，Cn	用于各种形状的文本检测	✓/-
仅文本识别	====	====	====	====	====	====
Char74k (2009)	74107 (-/-)	74107 (-/-)	水平	En， Kannada	全部为单字符	-/✓
IIIT 5K-Word (2012)	5000 (-/-)	5000 (2000/3000)	水平	-	字符边缘有干扰	-/✓
SVHN (2010)	99290 (73258/26032)	(-/-)	水平	-	全部为街景中的数字图片	-/✓
SWD (2014)	900w (-/-)	90k(-/-)	水平	-	全部为人工合成的文本	-/✓

OCR_Survey
OCR_Survey copied to clipboard

Metadata

OCR调研

文本检测

文本识别

端到端文本识别

其他会议和论文

论文和代码

综述

按年份分

按任务分

按会议和期刊分

数据集

数据集对比

其他资源

← Metadata

Owner

Metadata

OCR_Survey OCR_Survey copied to clipboard

Metadata

OCR调研

文本检测

文本识别

端到端文本识别

其他会议和论文

论文和代码

综述

按年份分

按任务分

按会议和期刊分

数据集

数据集对比

其他资源

← Metadata

Owner

Metadata

OCR_Survey
OCR_Survey copied to clipboard