xgboost icon indicating copy to clipboard operation
xgboost copied to clipboard

How can Random Forest outperform XGBoost?

Open rohan472000 opened this issue 5 months ago • 7 comments

Hi Viewer,

I am performing predictions using both XGBoost and Random Forest models on a dataset, but I consistently observe that the Random Forest model achieves better scores and correlation values compared to XGBoost, even though I am using extensive hyperparameter tuning for both models. Below are the hyperparameter grids I am using for tuning:

        # XGBoost parameter grid
        param_grid_xgb = {
            'n_estimators': [100, 200, 300, 350, 400, 500],
            'max_depth': [13, 15, 17, 27, 30, 35],
            'learning_rate': [0.01, 0.1, 0.2, 0.3, 0.001, 0.5],
        }
        
        # Random Forest parameter grid
        param_grid_rf = {
            'n_estimators': [100, 200, 300, 350, 400, 500],
            'max_depth': [13, 15, 17, 27, 30, 35],
            'min_samples_split': [12, 15, 10, 22, 25, 35],
            'min_samples_leaf': [11, 12, 14, 25, 30, 35],
            'max_features': ['sqrt', 'log2'],
        }

Despite trying different combinations of hyperparameters, the Random Forest model consistently outperforms the XGBoost model in terms of R² score and correlation.

To improve performance, I attempted to use a stacking regressor ensemble combining the two models (Random Forest and XGBoost). However, surprisingly, the ensemble results are coming less than Random Forest and greater than XGBoost.,

My questions are:

  • Why might Random Forest be performing better than XGBoost in this case? Could this be due to the specific nature of my dataset, model architecture, or the hyperparameters used?
  • Are there any additional factors or parameters in XGBoost that I should consider tweaking to enhance its performance in comparison to Random Forest?
  • How can I further investigate the performance differences between these models to ensure that I am leveraging the strengths of each?
  • Why is the stacking regressor ensemble not showing better results? In theory, it should combine the strengths of both models and perform better, but that's not happening. Are there any common reasons or mistakes that could lead to this outcome?

Any insights or suggestions would be greatly appreciated. Thank you in advance for your help!

rohan472000 avatar Sep 19 '24 06:09 rohan472000