datamol
datamol copied to clipboard
Add dm.read_records in io module
This would simplify things when working in the context of querying a rest api where the json response contains a list of molecules. Quick REST API examples:
- Molport
- Chemspace
- Mcule
- CDD Vault
- Dotmatics
- PubChem
- Etc...
Similarly add dm.real_yaml for convinience.
@zhu0619 can you log this in another issue providing context explaining the use cases of read_yaml ?
Having to maintain those API layers could be quite time-consuming as they tend to change over time.
Also by experience working with some of them, it can be tricky to get a unified datamol API given the difference in returned data in between all the above providers (while being nice, this is not necessarily an important point here).
Sorry maybe the examples were overwhelming here, in any case I think I can make a PR myself for this.
The idea is simply to replicate this case:
>>> data = [{'col_1': 3, 'col_2': 'a'},
... {'col_1': 2, 'col_2': 'b'},
... {'col_1': 1, 'col_2': 'c'},
... {'col_1': 0, 'col_2': 'd'}]
>>> pd.DataFrame.from_records(data)
col_1 col_2
0 3 a
1 2 b
2 1 c
3 0 d
Since this case is tied with what is usually received from most rest APIs (a list of dicts), remaining would be to add the logic for a smiles_column param like other io methods.
There is definitely a way to have a simple v1 while making sure to specify its limitations imho.
In any case, not urgent or a priority, and something I can add myself when I'll need it for a production case.
Ok and sorry I think I misunderstood here xD
I use list of dict to create df all the time and you can simply do pd.DataFrame(list_of_dict). That being said I am not sure what logic you want to add on datamol related to this. Usually a simple workflow is:
df = pd.DataFrame(list_of_dict)
df["mol"] = df[smiles_column].apply(dm.to_mol)
But feel free to post more examples or open a PR if you have something else in mind.
that's exactly what I want, as a single liner @hadim :smile: , thus why I say it's not a priority, just convenient and another case datamol can handle!
df = dm.from_records(list_of_dict, smiles_column="smiles")
no more or less complex than dm.read_csv already implemented here https://github.com/datamol-org/datamol/blob/main/datamol/io.py#L27
Note, that in a lot of cases, you might want to use pd.json_normalize instead.
Closing here. It's not clear to me whether we need this in datamol.
Please re-open if needed.