matbench icon indicating copy to clipboard operation
matbench copied to clipboard

new_benchmark

Open Alice0416 opened this issue 8 months ago • 0 comments

Description

This pull request adds a random forest algorithm utilizing features from the Sine Coulomb Matrix and MagPie featurization algorithms. Here are the key details of the algorithm:

  • Sine Coulomb Matrix: Creates structural features based on Coulombic interactions within a periodic boundary condition (suitable for crystalline materials with known structures).

  • MagPie Features: Weighted elemental features derived from elemental data such as electronegativity, melting point, and electron affinity.

Both algorithms were executed within the Automatminer v1.0.3.20191111 framework for convenience, although no auto-featurization or AutoML processes were applied.

Data Processing

  • Data Cleaning: Features with more than 1% NaN samples were dropped. Missing samples were imputed using the mean of the training data.

  • Featurization:

  1. For structure problems: Both Sine Coulomb Matrix and MagPie features were applied.

  2. For problems without structure: Only MagPie features were applied.

Model Details

  • Random Forest: Utilizes 500 estimators.

  • Hyperparameter Tuning: None performed. A large, constant number of trees were used in constructing each fold's model, using the entire training+validation set as training data for the random forest.

Additional Information

Raw Data and Example Notebook: Available on the matbench repository.

Included files

-- benchmarks
---- matbench_v0.1_RFSCM/Magpie
------ results.json.gz             # required filename
------ my_python_file.py            # required filename
------ info.json                   # required filename

Alice0416 avatar Jun 20 '24 21:06 Alice0416