CASMcode
CASMcode copied to clipboard
How to avoid the unknown configurations while fitting.
I did the enumeration by Pymatgen and When i imported my DFT data, CASM created some additional configurations, but those are without any DFT data ( energies, converged structures). these unknown configurations are still in my CASM project. In my train file i only have structures for which i have DFT data i.e., the calculated ones. When i use the command "casm-learn -s fit.json --checkhull > fit_log.txt" it prints those unknown configurations also as "unknown".
-- Check: individual 0 -- Index: Selected #Selected CV RMS wRMS Estimator FeatureSelection Note
0: 1111111111111111111111011111111 30 0.026055716 0.010945006 0.013644882 Lasso SelectFromModel
Writing: /data/javeedd/BCC/CE/cluster_expansions/clex.formation_energy/calctype.default/ref.default/bset.default/eci.__tmp/eci.json
DFT ground states: name comp(a) configname dft_hull_dist formation_energy clex_hull_dist clex(formation_energy) clex_dft_hull_dist SCEL1_1_1_1_0_0_0/0 0.000 SCEL1_1_1_1_0_0_0/0 0.0 0.000000 0.000000 0.000942 0.0 SCEL12_6_2_1_1_5_2/2 0.250 SCEL12_6_2_1_1_5_2/2 0.0 -0.153580 0.000000 -0.146531 0.0 SCEL12_6_1_2_0_5_4/1 0.500 SCEL12_6_1_2_0_5_4/1 0.0 -0.225089 0.028147 -0.209743 0.0 SCEL8_2_2_2_0_0_0/2 0.625 SCEL8_2_2_2_0_0_0/2 0.0 -0.190727 0.031752 -0.160895 0.0 SCEL1_1_1_1_0_0_0/1 1.000 SCEL1_1_1_1_0_0_0/1 0.0 0.000000 0.000000 0.002189 0.0
Predicted ground states: name comp(a) configname dft_hull_dist formation_energy clex_hull_dist clex(formation_energy) clex_dft_hull_dist SCEL1_1_1_1_0_0_0/0 0.000000 SCEL1_1_1_1_0_0_0/0 0.00000000 0.00000000 0.0 0.000942 0.000000 SCEL12_6_2_1_1_5_2/2 0.250000 SCEL12_6_2_1_1_5_2/2 0.00000000 -0.15358013 0.0 -0.146531 0.000000 SCEL6_3_1_2_0_2_2/0 0.333333 SCEL6_3_1_2_0_2_2/0 unknown unknown 0.0 -0.195637 -0.028036 SCEL2_2_1_1_0_1_1/0 0.500000 SCEL2_2_1_1_0_1_1/0 unknown unknown 0.0 -0.237890 -0.028147 SCEL12_6_2_1_1_4_0/5 0.583333 SCEL12_6_2_1_1_4_0/5 0.00320163 -0.19897974 0.0 -0.208706 -0.031529 SCEL4_2_2_1_1_0_0/0 0.750000 SCEL4_2_2_1_1_0_0/0 unknown unknown 0.0 -0.144467 -0.037934 SCEL1_1_1_1_0_0_0/1 1.000000 SCEL1_1_1_1_0_0_0/1 0.00000000 0.00000000 0.0 0.002189 0.000000
Are they duplicate structures (non-primitive) or are they unique? If they are unique, then you probably want to keep them there.
I don't use casm learn
, but one thing you could try is to create a new selection file without them with the --subset
function of casm select
. Then read back in this selection file with casm select -c selection.json --set selected
and the configurations should hopefully not show up anymore (you can check with casm query
).
how should i keep them there, i don't have the DFT data for them, those were created during the CASM import from my calculated configurations i think.
i think this issue is same as discussed here.
https://github.com/prisms-center/CASMcode/issues/293#issue-1766399531
Try importing with the setting {"mapping": {"primitive_only": true}}
.
You can check if they are primitive by querying is_primitive
.