CASMcode icon indicating copy to clipboard operation
CASMcode copied to clipboard

How to avoid the unknown configurations while fitting.

Open darjaved opened this issue 1 year ago • 4 comments

I did the enumeration by Pymatgen and When i imported my DFT data, CASM created some additional configurations, but those are without any DFT data ( energies, converged structures). these unknown configurations are still in my CASM project. In my train file i only have structures for which i have DFT data i.e., the calculated ones. When i use the command "casm-learn -s fit.json --checkhull > fit_log.txt" it prints those unknown configurations also as "unknown".

-- Check: individual 0 -- Index: Selected #Selected CV RMS wRMS Estimator FeatureSelection Note

0: 1111111111111111111111011111111 30           0.026055716  0.010945006  0.013644882  Lasso                    SelectFromModel

Writing: /data/javeedd/BCC/CE/cluster_expansions/clex.formation_energy/calctype.default/ref.default/bset.default/eci.__tmp/eci.json

DFT ground states: name comp(a) configname dft_hull_dist formation_energy clex_hull_dist clex(formation_energy) clex_dft_hull_dist SCEL1_1_1_1_0_0_0/0 0.000 SCEL1_1_1_1_0_0_0/0 0.0 0.000000 0.000000 0.000942 0.0 SCEL12_6_2_1_1_5_2/2 0.250 SCEL12_6_2_1_1_5_2/2 0.0 -0.153580 0.000000 -0.146531 0.0 SCEL12_6_1_2_0_5_4/1 0.500 SCEL12_6_1_2_0_5_4/1 0.0 -0.225089 0.028147 -0.209743 0.0 SCEL8_2_2_2_0_0_0/2 0.625 SCEL8_2_2_2_0_0_0/2 0.0 -0.190727 0.031752 -0.160895 0.0 SCEL1_1_1_1_0_0_0/1 1.000 SCEL1_1_1_1_0_0_0/1 0.0 0.000000 0.000000 0.002189 0.0

Predicted ground states: name comp(a) configname dft_hull_dist formation_energy clex_hull_dist clex(formation_energy) clex_dft_hull_dist SCEL1_1_1_1_0_0_0/0 0.000000 SCEL1_1_1_1_0_0_0/0 0.00000000 0.00000000 0.0 0.000942 0.000000 SCEL12_6_2_1_1_5_2/2 0.250000 SCEL12_6_2_1_1_5_2/2 0.00000000 -0.15358013 0.0 -0.146531 0.000000 SCEL6_3_1_2_0_2_2/0 0.333333 SCEL6_3_1_2_0_2_2/0 unknown unknown 0.0 -0.195637 -0.028036 SCEL2_2_1_1_0_1_1/0 0.500000 SCEL2_2_1_1_0_1_1/0 unknown unknown 0.0 -0.237890 -0.028147 SCEL12_6_2_1_1_4_0/5 0.583333 SCEL12_6_2_1_1_4_0/5 0.00320163 -0.19897974 0.0 -0.208706 -0.031529 SCEL4_2_2_1_1_0_0/0 0.750000 SCEL4_2_2_1_1_0_0/0 unknown unknown 0.0 -0.144467 -0.037934 SCEL1_1_1_1_0_0_0/1 1.000000 SCEL1_1_1_1_0_0_0/1 0.00000000 0.00000000 0.0 0.002189 0.000000

darjaved avatar Feb 26 '24 05:02 darjaved

Are they duplicate structures (non-primitive) or are they unique? If they are unique, then you probably want to keep them there.

I don't use casm learn, but one thing you could try is to create a new selection file without them with the --subset function of casm select. Then read back in this selection file with casm select -c selection.json --set selected and the configurations should hopefully not show up anymore (you can check with casm query).

xivh avatar Feb 28 '24 22:02 xivh

how should i keep them there, i don't have the DFT data for them, those were created during the CASM import from my calculated configurations i think.

darjaved avatar Feb 29 '24 06:02 darjaved

i think this issue is same as discussed here.

https://github.com/prisms-center/CASMcode/issues/293#issue-1766399531

darjaved avatar Mar 01 '24 05:03 darjaved

Try importing with the setting {"mapping": {"primitive_only": true}}.

You can check if they are primitive by querying is_primitive.

xivh avatar Mar 01 '24 22:03 xivh