Add more Pandas-based Checkpointing and Save/Load Functions
Follow up to https://github.com/hatchet/hatchet/pull/272
This PR adds the following new functions for checkpointing GraphFrames (i.e., saving to/reading from files):
-
to_pickleandfrom_pickle(Pickle Format) -
to_csvandfrom_csv -
to_excelandfrom_excelThese functions utilize similar read/write functions from Pandas. In many cases, these Pandas functions require additional dependencies. Those dependencies will not be required in Hatchet. If the dependency for a particular function is not installed, Pandas will raise anImportError.
This PR also adds new save and load functions to the GraphFrame class. These functions can be used to simplify the use of checkpointing. Both of these functions only require one argument: the filename. If the filename contains a recognized extension, that format will be used. Otherwise, the optional fileformat parameter can be provided to specify the desired format. If the necessary dependencies are not installed, the ImportError raised by Pandas will be caught. In that case, all remaining formats will be attempted. If no supported format succeeds, an IOError will be raised.
All the new functions added in this PR accepts keyword arguments (i.e., **kwargs). These arguments will be passed to the Pandas function that is eventually invoked to read/write the file. Documentation (i.e., docstrings) will be added that will link to the associated functions' documentation.
Other file formats (e.g., Parquet and Feather) will be added in future PRs.
Originally from hatchet/hatchet on May 18, 2021
I might wait until https://github.com/hatchet/hatchet/pull/377 is merged before marking this PR ready-for-review. This PR adds some global configuration type data to all the save and load functions to determine the file format to use based on file extension. If this data was placed in the global configuration system, user's would be able to add "rules" telling those functions to save/load files with non-standard extensions using a certain file format.
Originally from May 22, 2021:
Implementation and testing is now complete. This PR depends on https://github.com/hatchet/hatchet/pull/272, so it definitely shouldn't be reviewed or merged until https://github.com/hatchet/hatchet/pull/272 is merged. I also want to integrate https://github.com/hatchet/hatchet/pull/377, but I might do that in a separate PR.