openff-evaluator
openff-evaluator copied to clipboard
Cannot roundtrip to/from pandas
I'm using an old version of Evaluator (0.3.5), but looking at the code I don't think it's changed in the relevant parts.
I have converted my dataset to a dataframe for filtering, but I can't convert it back. The reason is ExactAmounts are interpreted as floats. The reason is that I have mixed Nones and integers in the column, which Pandas interprets as float64; nan is a float. A column of all Nones does not have this problem, because Pandas does not convert None to NaN and keeps it as an object. The relevant code is here:
https://github.com/openforcefield/openff-evaluator/blob/9f6e8348dbbdf5da4e4331cf368278a06226d3e6/openff/evaluator/datasets/datasets.py#L603-L604
IMO code changes should go in from_pandas because then you can read from general CSV files.
I also noticed this line:
https://github.com/openforcefield/openff-evaluator/blob/9f6e8348dbbdf5da4e4331cf368278a06226d3e6/openff/evaluator/datasets/datasets.py#L577
This doesn't seem to have caused problems yet, but I would generally recommend changing this to itertuples. iterrows does not preserve column type, but converts the row into a Series (https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.iterrows.html). This is another great way to change an integer into a float without realising. However, itertuples may be hard to work with as there spaces in the column headings.