xgbspark-text-classification icon indicating copy to clipboard operation
xgbspark-text-classification copied to clipboard

XGBoost on Spark for Chinese Text Classification

Features

  • Data Source: Hive
  • Word Segmentation: Ansj
  • Feature Engineering: NGram + TF-IDF or Pre-Trained Word2Vec
  • Classification Algorithm: XGBoost
  • Model Training: Spark Pipeline
  • Model Selection and Tuning: Cross Validation + Grid Search

Environments