gmso
gmso copied to clipboard
Move pandas dataframe handling to external convert_dataframe module
This PR looks to improve the handling for converting a topology to a dataframe. This currently lives as a method for topology. It is now being moved to a convert_dataframe.py module. A few different formats are available which give some nice default ways to view a topology. Notably, we have the formats:
-publication
which gives all the parameter values you would want to have in a table for publication. This also removes duplicates so each parameter is only listed once.
-default
some default values which are nice to have
-remove_duplicates
which allows you to get a smaller dataframe with duplicate rows removed.
-specific_columns
Allows the user to specify what they want in the dataframe.
There is also an added function that allows you to generate dataframes that cover the parameters for a set of topologies.
Finally, there will be some function that prints the dataframes with the rdkit mols which are labeled to match the dataframes.
TODO Checklist:
- [ ] Error checking on arguments
- [x] Replace topology.py dataframe methods/tests
- [ ] Doc strings
- [x] Handle units
- [x] Handle parameters that return lists
- [x] Handle parameters that return dictionaries
- [x] Return unique elements if style is publication
- [x] Handle parameter "all" better
- [x] Function to concatenate multiple topogies into one output
- [x] Remove replicate rows flag -> similar to the publication style, but without the atom_indices section added
- [ ] Function to create a topology with all data from dataframe matching rdkit mol image
- [x] Could just also make this a format unique_types