megnet icon indicating copy to clipboard operation
megnet copied to clipboard

Problem with load molecule data from ase

Open dimka11 opened this issue 2 years ago • 4 comments

I don't understand how I can load data from ase format. I look this tutorial https://github.com/materialsvirtuallab/megnet/blob/master/notebooks/molecule_example.ipynb and have tried to convert the data to xyz files, but this files loaded by pybel but can't be load to the model.

dimka11 avatar Mar 31 '22 17:03 dimka11

I am not sure if I get your question. You are saying the example does not work even if you converted the ase Atoms to xyz file?

chc273 avatar Mar 31 '22 21:03 chc273

@chc273 Thanks for response! That's is a example of my xyz file:

34
Properties=species:S:1:pos:R:3 pbc="F F F"
C       23.94271088      -4.14493513      -2.98162127
C       24.55592728      -0.82619798       1.23874521
O       20.93027115       2.65132999       1.20267034
C       16.11702538       1.21504414       1.46484005
O       15.08468533      -3.13689113       1.72822750
N       12.34882450       4.55354691       1.44151032
C        7.51371670       2.86691523       1.71749294
N        5.92233944      -1.59980488       2.00862408
N        1.48521304      -2.39037442       2.22327352
C       -1.57001507       1.23494565       2.15761590
C       -6.86996460       0.83962160       2.38604617
C       -8.86610794       0.44161573      -2.65364766
C      -14.11276245       0.06187227      -2.19375157
C      -16.35991859      -4.27609301      -1.76654506
C      -21.28580284      -4.39460611      -1.34774482
C      -23.98753166      -0.24840684      -1.35021245
C      -21.73451805       4.07528114      -1.77712965
C      -16.83377075       4.22927999      -2.19576573
S        2.25029945       5.97498083       1.76499522
H       24.44957352      -7.96948814      -1.98474431
H       20.41406441      -3.66614795      -4.67646313
H       26.86673546      -3.41824102      -5.65698814
H       24.43783188      -2.88718438       4.61809397
H       28.05641365       1.06847334       0.92216319
H       12.93233967       8.17394257       1.23317087
H       -7.73730135      -2.48141146       4.22825480
H       -8.44302177       3.79710603       4.37515783
H       -8.03818798       3.85060787      -4.53595924
H       -7.34933376      -2.67482758      -4.47931862
H      -14.32896519      -7.55946207      -1.75550330
H      -23.16374397      -7.72327423      -1.00702596
H      -27.87986374      -0.56414610      -1.00707245
H      -23.81028938       7.30708075      -1.78150427
H      -14.93687820       7.61984539      -2.54143047

After loaded by pybel it's look incorrectly compared with moleculus from molecules.json, instead the structure it's show only C .. O . C..

(pybel doesn't molecule structure )

And after training of megnet model start I get error:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
/usr/local/lib/python3.7/dist-packages/megnet/models/base.py in get_all_graphs_targets(self, structures, targets, scrub_failed_structures)
    293             try:
--> 294                 graph = self.graph_converter.convert(s)
    295                 graphs_valid.append(graph)

8 frames
ValueError: max() arg is an empty sequence

During handling of the above exception, another exception occurred:

RuntimeError                              Traceback (most recent call last)
<decorator-gen-53> in time(self, line, cell, local_ns)

<timed eval> in <module>()

/usr/local/lib/python3.7/dist-packages/megnet/models/base.py in get_all_graphs_targets(self, structures, targets, scrub_failed_structures)
    299                     warn(f"structure with index {i} failed the graph computations", UserWarning)
    300                     continue
--> 301                 raise RuntimeError(str(e))
    302         return graphs_valid, targets_valid
    303 

Colab notebook: https://colab.research.google.com/drive/16MXFzX8dtmt4LHzEAOV2ctAohVfeBcP2?usp=sharing

and few xyz examples: https://github.com/dimka11/mol_data

I participate in some competition and task is predict energy for molecule

I would be grateful for any information.

dimka11 avatar Apr 02 '22 07:04 dimka11

I see where the problem is. In the molecule you showed, there is no chemical bond per pybel's definition. (the error message should have been better).

In any case, the MolecularGraph is not well supported and is only limited to using the QM9 molecules with elements like "H", "C", "N", "O", "F".

Please consider using alternative methods like this one instead https://github.com/materialsvirtuallab/megnet/blob/master/notebooks/qm9_simple_model.ipynb

chc273 avatar Apr 03 '22 05:04 chc273

@chc273 Thank you. Model works now. I want to know, CrystalGraph supports only pymatgen structure, not openbabel? Where can I find out more information about tuning hyperparameters? I trained model with 130k molecule examples and 300 epoch. it was 6.5 hour only for training on P100. Is it reasonable? Should I try to continue training with more numbers of epoch for increase accuracy or would I have to do something else?

dimka11 avatar Apr 04 '22 09:04 dimka11