gmso icon indicating copy to clipboard operation
gmso copied to clipboard

Move pandas dataframe handling to external convert_dataframe module

Open CalCraven opened this issue 10 months ago • 0 comments

This PR looks to improve the handling for converting a topology to a dataframe. This currently lives as a method for topology. It is now being moved to a convert_dataframe.py module. A few different formats are available which give some nice default ways to view a topology. Notably, we have the formats: -publication which gives all the parameter values you would want to have in a table for publication. This also removes duplicates so each parameter is only listed once. -default some default values which are nice to have -remove_duplicates which allows you to get a smaller dataframe with duplicate rows removed. -specific_columns Allows the user to specify what they want in the dataframe.

There is also an added function that allows you to generate dataframes that cover the parameters for a set of topologies.

Finally, there will be some function that prints the dataframes with the rdkit mols which are labeled to match the dataframes.

TODO Checklist:

  • [ ] Error checking on arguments
  • [x] Replace topology.py dataframe methods/tests
  • [ ] Doc strings
  • [x] Handle units
  • [x] Handle parameters that return lists
  • [x] Handle parameters that return dictionaries
  • [x] Return unique elements if style is publication
  • [x] Handle parameter "all" better
  • [x] Function to concatenate multiple topogies into one output
  • [x] Remove replicate rows flag -> similar to the publication style, but without the atom_indices section added
  • [ ] Function to create a topology with all data from dataframe matching rdkit mol image
    • [x] Could just also make this a format unique_types

CalCraven avatar Apr 01 '24 14:04 CalCraven