xgbspark-text-classification
xgbspark-text-classification copied to clipboard
XGBoost on Spark for Chinese Text Classification
Features
- Data Source:
Hive - Word Segmentation:
Ansj - Feature Engineering:
NGram + TF-IDForPre-Trained Word2Vec - Classification Algorithm:
XGBoost - Model Training:
Spark Pipeline - Model Selection and Tuning:
Cross Validation + Grid Search
Environments
- Spark 2.1.1
- Hive 1.2.1
- XGBoost4J-Spark 0.7
- Ansj 5.1.2