modnet icon indicating copy to clipboard operation
modnet copied to clipboard

complex compositions take very long to featurize

Open Pepe-Marquez opened this issue 2 years ago • 4 comments

I would like to run modnet on a dataset in which I have compositions that have very complex stoichiometries. On example would be C100H3815Br21I279N2185Pb100

To reproduce, this could be an example code:

import pandas as pd
from modnet.models import MODNetModel
from modnet.preprocessing import MODData
from pymatgen.core import Composition

data = {'composition': ['Cu2ZnSnSe4', 'Cu2ZnSnS4', 'CsPbI3', 'CH3NH3PbI3', 'C100H3815Br21I279N2185Pb100' ],
        'target': [1.0, 1.5, 1.78, 1.6, 1.63]}
df_simple = pd.DataFrame(data)
df_simple["composition"] = df_simple["composition"].map(Composition)

data = MODData(
    materials=df_simple["composition"], # you can provide composition objects to MODData
    targets=df_simple["target"], # you can provide target values to MODData
    target_names=["target"]

data.featurize()

Am I doing something wrong here? Would there be a workaround to get these complex compositions running smoother through the featurizer?

Thanks!

Pepe-Marquez avatar Jan 10 '23 12:01 Pepe-Marquez