Dilute concentration needs large supercell. However, enum random seems have to start from the smallest supercell, and it accounts duplicates for nconfigs.

Could you give a quick command to enumerate starting from a large supercell, and avoid the already numerated ones? Otherwise I have to give an abnormally large nconfigs, which slows down the enumeration drastically.

Thanks,

Jun 29 '22 17:06 sjtuzhanglei

A couple ideas here:

Use --min and --max to specify minimum and maximum volume supercells to enumerate in directly

This overrides the default "min" and "max" parameters of the "supercells" option for the enumeration methods
Use the "supercells" / "scelnames" / "supercell_selection" options

See casm enum --desc ScelEnum for all the parameters of the "supercells" option. The "supercells" option also lets you specify the supercell directly through the transformation matrix from the primitive cell lattice. Alternatively, you can use the "scelnames" or "supercell_selection" options to directly specify which already enumerated supercells you want to enumerate configurations in.
Maybe cluster perturbation enumeration with ConfigEnumAllOccupations would be useful

If you use JSON input to ConfigEnumAllOccupations you can include "cluster_specs" (see casm enum --desc ConfigEnumAllOccupations), with format described here, specifying which clusters to enumerate occupations on. Typically this is done for all clusters of increasing number and site-to-site distance up to some cutoffs, but you can also specify particular clusters directly with "orbit_specs". This lets you enumerate symmetrically unique perturbations in a supercell with the default configuration as the background (as specified with the "supercells", "scelnames", or "supercell_selection" options), or within user-specified background configurations (as specified with the "confignames", "config_selection", or "config_list" options).

Jun 29 '22 19:06 bpuchala

I tried:

casm enum -m ConfigEnumRandomOccupations -i '{"n_config":900000}' --filter 'eq(comp_n(Y),mult(2,comp_n(Va)))' --min 10 --max 16

but why almost all configs are excluded by filter. Is there anyway to more efficiently enumerate the wanted configs for large supercells?

It seems for small supercells, the enumeration is quite efficient. And how large a supercell (how many configurations) is generally sufficient for ECI training?

Write supercell database... DONE Write configuration database... DONE -- Begin: ConfigEnumRandomOccupations -- Input from JSON (--input or --setings): { "n_config" : 900000 }

Input from casm enum options: { "filter" : "eq(comp_n(Y),mult(2,comp_n(Va)))", "input" : "{"n_config":900000}", "max" : 16, "method" : "ConfigEnumRandomOccupations", "min" : 10 }

Combined Input: { "filter" : "eq(comp_n(Y),mult(2,comp_n(Va)))", "n_config" : 900000, "supercells" : { "max" : 16, "min" : 10 } }

-- Checking input -- primitive_only: true filter: true filter expression: eq(comp_n(Y),mult(2,comp_n(Va))) verbosity: 10 dry_run: false output_configurations: false

of initial enumeration states: 228

-- Begin: ConfigEnumRandomOccupations enumeration --

configurations in this project: 92495

Begin enumeration Enumerate configurations for: SCEL10_1_1_10_0_0_0 900000 configurations (2 new, 899998 excluded by filter).

Enumerate configurations for: SCEL10_1_10_1_1_0_0 900000 configurations (1 new, 899999 excluded by filter).

Enumerate configurations for: SCEL10_1_10_1_2_0_0 900000 configurations (6 new, 899994 excluded by filter).

Enumerate configurations for: SCEL10_1_10_1_3_0_0 900000 configurations (3 new, 899997 excluded by filter).

Enumerate configurations for: SCEL10_1_10_1_6_0_0 900000 configurations (0 new, 900000 excluded by filter).

Enumerate configurations for: SCEL10_1_2_5_1_0_0 900000 configurations (3 new, 899997 excluded by filter).

Enumerate configurations for: SCEL10_10_1_1_0_1_9 900000 configurations (2 new, 899998 excluded by filter).

Enumerate configurations for: SCEL10_10_1_1_0_8_1 900000 configurations (3 new, 899997 excluded by filter).

Enumerate configurations for: SCEL10_10_1_1_0_7_1 900000 configurations (5 new, 899995 excluded by filter).

Enumerate configurations for: SCEL10_10_1_1_0_1_6 900000 configurations (3 new, 899997 excluded by filter).

Enumerate configurations for: SCEL10_10_1_1_0_9_5 900000 configurations (2 new, 899998 excluded by filter). ......

Jun 30 '22 02:06 sjtuzhanglei

The filter is applied after generating a configuration, so in a large supercell the likelihood of having a composition matching the filter is decreased.

CASM doesn't have a fixed composition enumeration method, but it does allow storing configurations encountered during Monte Carlo. So perhaps a useful approach would be to "fit" a cluster expansion with just a constant term, run canonical Monte Carlo at various compositions, and use the "enumeration" Monte Carlo option to store encountered configurations. The defaults assume that a user wants to store configurations that break the cluster-expansion predicted convex hull, but you could change the "metric" to just use "formation_energy" or some other quantity like composition would be the same for all configurations. You can also change the sample frequency to change how much the sampled configurations differ from each other.

Jun 30 '22 16:06 bpuchala

BTW, what does it mean by saying which clusters to enumerate occupations on:

If you use JSON input to ConfigEnumAllOccupations you can include "cluster_specs" (see casm enum --desc ConfigEnumAllOccupations), with format described here, specifying which clusters to enumerate occupations on.

Jul 10 '22 20:07 sjtuzhanglei

The "cluster_specs" option allows enumerating configurations that are perturbations of a "background" configuration. The perturbed configurations have 1, 2, 3, etc. sites different from the background configuration. All such perturbations can be generated by finding symmetrically unique clusters of sites, taking into account the background configuration's occupation. The "cluster_specs" option specifies the range of such clusters.

Jul 22 '22 16:07 bpuchala

CASMcode
CASMcode copied to clipboard

How to efficiently enumerate configs with dilute concentration?

of initial enumeration states: 228

configurations in this project: 92495

CASMcode CASMcode copied to clipboard

How to efficiently enumerate configs with dilute concentration?

of initial enumeration states: 228

configurations in this project: 92495

CASMcode
CASMcode copied to clipboard