openff-evaluator
openff-evaluator copied to clipboard
Add `tidy` keyword to to_pandas?
I was surprised that .to_pandas converts to a wide format where each property type gets its own column and imposed unit. I would have thought it more intuitive to convert to a tidier format. i.e.
Instead of:
Index(['Id', 'Temperature (K)', 'Pressure (kPa)', 'Phase', 'N Components',
'Component 1', 'Role 1', 'Mole Fraction 1', 'Exact Amount 1',
'Component 2', 'Role 2', 'Mole Fraction 2', 'Exact Amount 2',
'SolvationFreeEnergy Value (kJ / mol)',
'SolvationFreeEnergy Uncertainty (kJ / mol)', 'Source'],
dtype='object')
You could have:
Index(['Id', 'Temperature (K)', 'Pressure (kPa)', 'Phase', 'N Components',
'Component 1', 'Role 1', 'Mole Fraction 1', 'Exact Amount 1',
'Component 2', 'Role 2', 'Mole Fraction 2', 'Exact Amount 2',
'Property type', 'Value', 'Value unit', 'Uncertainty', 'Uncertainty unit', 'Source'],
dtype='object')
This would be more efficient memory-wise (edit: for mixed datasets), as you no longer have NaNs taking up a bunch of space, as well as help in filtering by property type. When working direclty with the dataframe it would be much easier to see how many of each property type you have and to group by it.