ludwig icon indicating copy to clipboard operation
ludwig copied to clipboard

Save and load pre-trained encoders (transfer learning support)

Open dantreiman opened this issue 3 years ago • 1 comments

  • Refactors LudwigModule into its own file ludwig_module.py (from torch_utils.py)
    • Adds LudwigModuleState dataclass, which is a POD representation any ludwig module.
    • Adds get_state() and @classmethod restore_from_state() methods fortoLudwigModule
  • Adds serialization.py, which reads and writes Ludwig modules to .h5 files.
  • Implements serialization for text encoders
  • Modifies input feature schema to allow encoder name or URL (schema_utils.StringOptionsOrURL)
  • Load pretrained encoders before preprocessing, re-use preprocessing data (vocab) from pretrained
  • [Temporarily] passes all encoder parameters to superclass constructor to save in a dictionary. This code will be unnecessary once https://github.com/ludwig-ai/ludwig/pull/2269 is in.
  • Adds serialization_test.py with transfer learning example in test_transfer_learning

dantreiman avatar Jul 14 '22 20:07 dantreiman

Unit Test Results

         6 files         6 suites   2h 41m 6s :stopwatch:   3 397 tests 2 505 :heavy_check_mark:   78 :zzz:    814 :x: 10 191 runs  7 492 :heavy_check_mark: 257 :zzz: 2 442 :x:

For more details on these failures, see this check.

Results for commit 723989ae.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Jul 14 '22 21:07 github-actions[bot]