hatchet icon indicating copy to clipboard operation
hatchet copied to clipboard

Add more Pandas-based Checkpointing and Save/Load Functions

Open ilumsden opened this issue 3 years ago • 2 comments

Follow up to https://github.com/hatchet/hatchet/pull/272

This PR adds the following new functions for checkpointing GraphFrames (i.e., saving to/reading from files):

  1. to_pickle and from_pickle (Pickle Format)
  2. to_csv and from_csv
  3. to_excel and from_excel These functions utilize similar read/write functions from Pandas. In many cases, these Pandas functions require additional dependencies. Those dependencies will not be required in Hatchet. If the dependency for a particular function is not installed, Pandas will raise an ImportError.

This PR also adds new save and load functions to the GraphFrame class. These functions can be used to simplify the use of checkpointing. Both of these functions only require one argument: the filename. If the filename contains a recognized extension, that format will be used. Otherwise, the optional fileformat parameter can be provided to specify the desired format. If the necessary dependencies are not installed, the ImportError raised by Pandas will be caught. In that case, all remaining formats will be attempted. If no supported format succeeds, an IOError will be raised.

All the new functions added in this PR accepts keyword arguments (i.e., **kwargs). These arguments will be passed to the Pandas function that is eventually invoked to read/write the file. Documentation (i.e., docstrings) will be added that will link to the associated functions' documentation.

Other file formats (e.g., Parquet and Feather) will be added in future PRs.

ilumsden avatar Feb 09 '22 14:02 ilumsden

Originally from hatchet/hatchet on May 18, 2021

I might wait until https://github.com/hatchet/hatchet/pull/377 is merged before marking this PR ready-for-review. This PR adds some global configuration type data to all the save and load functions to determine the file format to use based on file extension. If this data was placed in the global configuration system, user's would be able to add "rules" telling those functions to save/load files with non-standard extensions using a certain file format.

ilumsden avatar Feb 09 '22 14:02 ilumsden

Originally from May 22, 2021:

Implementation and testing is now complete. This PR depends on https://github.com/hatchet/hatchet/pull/272, so it definitely shouldn't be reviewed or merged until https://github.com/hatchet/hatchet/pull/272 is merged. I also want to integrate https://github.com/hatchet/hatchet/pull/377, but I might do that in a separate PR.

ilumsden avatar Feb 09 '22 14:02 ilumsden