megnet icon indicating copy to clipboard operation
megnet copied to clipboard

Train the model for customised train-test split

Open kdmsit opened this issue 3 years ago • 4 comments

I have around 40K crystal data from the materials project database in .cif file format. I want to train the megnet model from scratch using my own train test split (e.g train 20% test 80%) for formation energy and bandgap property. Could you please help me, how to do that?

kdmsit avatar Aug 28 '21 08:08 kdmsit

@kdmsit can you be more specific?

Please see the example notebooks for how to use the models. Also the megnet model predicts intensive properties so for extensive properties you will need to convert it to a per-atom quantity

chc273 avatar Aug 30 '21 18:08 chc273

I am using the fo0llowing code snippet for it:

from pymatgen.core.structure import Structure
nfeat_bond = 100
epoch=1000
r_cutoff = 5
gaussian_centers = np.linspace(0, r_cutoff + 1, nfeat_bond)
gaussian_width = 0.5
graph_converter = CrystalGraph(cutoff=r_cutoff)
model = MEGNetModel(graph_converter=graph_converter, centers=gaussian_centers, width=gaussian_width)
graphs_valid = []
targets_valid = []
structures_invalid = []
for i in idx_train:
    crystal=Structure.from_file(os.path.join(data_path, str(i) + '.cif'))
    p=float(id_prop_data[i][index])
    try:
        graph = graph_converter.convert(crystal)
        graphs_valid.append(graph)
        targets_valid.append(p)
    except:
        structures_invalid.append(crystal)
print("Train Data Load Done......")

print("Training the model......")
model.train_from_graphs(graphs_valid, targets_valid,epochs=epoch)

for i in idx_test:
    try:
        new_structure = Structure.from_file(os.path.join(data_path, str(i) + '.cif'))
        pred_target = model.predict_structure(new_structure)
        true_target = float(id_prop_data[i][index])
        ae = abs(float(pred_target[0])-true_target)`
```

But I am not able to acheive good results. Could you please help me to understand whether I am doing the training in correct way or not.

kdmsit avatar Aug 31 '21 05:08 kdmsit

@kdmsit I don't see an issue in the code. In general, you need to check whether the target properties are intensive and whether or not they can be predicted from the structure. Please provide more details if you still cannot find the solutions.

chc273 avatar Sep 08 '21 18:09 chc273

If it is only MP structures, formation energy and band gap, those should be fairly easy to train. https://github.com/materialsvirtuallab/megnet/blob/master/notebooks/crystal_example.ipynb Check this for example.

chc273 avatar Sep 08 '21 18:09 chc273