matbench
matbench copied to clipboard
new_benchmark
Description
This pull request adds a random forest algorithm utilizing features from the Sine Coulomb Matrix and MagPie featurization algorithms. Here are the key details of the algorithm:
-
Sine Coulomb Matrix: Creates structural features based on Coulombic interactions within a periodic boundary condition (suitable for crystalline materials with known structures).
-
MagPie Features: Weighted elemental features derived from elemental data such as electronegativity, melting point, and electron affinity.
Both algorithms were executed within the Automatminer v1.0.3.20191111 framework for convenience, although no auto-featurization or AutoML processes were applied.
Data Processing
-
Data Cleaning: Features with more than 1% NaN samples were dropped. Missing samples were imputed using the mean of the training data.
-
Featurization:
-
For structure problems: Both Sine Coulomb Matrix and MagPie features were applied.
-
For problems without structure: Only MagPie features were applied.
Model Details
-
Random Forest: Utilizes 500 estimators.
-
Hyperparameter Tuning: None performed. A large, constant number of trees were used in constructing each fold's model, using the entire training+validation set as training data for the random forest.
Additional Information
Raw Data and Example Notebook: Available on the matbench repository.
Included files
-- benchmarks
---- matbench_v0.1_RFSCM/Magpie
------ results.json.gz # required filename
------ my_python_file.py # required filename
------ info.json # required filename