Sohu-2018-4th-place-solution
Sohu-2018-4th-place-solution copied to clipboard
2018搜狐内容识别算法大赛-解决方案(4th)
Sohu-2018-4th-place-solution
2018 搜狐内容识别算法大赛
Overview
Preprocessing
- Html filter
- Segmentation
- Extra-features
- Data Augementation
Task1:Label Classification
EDA
- Word_tfidf
- Char_tfidf
- Word2vec
Models
- NBSVM
- LGBM
- TextCNN
- RCNN
- Bi-LSTM
- Bi-GRU
Ensemble
- Word2vec dimentions
- Embedding layer
- 01-2 0-1 classification
Task2:Text Extraction
- Keywords
- Extract text
Task3:Image Classification
- Text Recognition
- Text Classification
- Area Filtering (CTPN)
See more detail in my blog https://sanshibayuan.github.io/