ludwig Save and load pre-trained encoders (transfer learning support)

Save and load pre-trained encoders (transfer learning support)

Open dantreiman opened this issue 3 years ago • 1 comments

Refactors LudwigModule into its own file ludwig_module.py (from torch_utils.py)
- Adds LudwigModuleState dataclass, which is a POD representation any ludwig module.
- Adds get_state() and @classmethod restore_from_state() methods fortoLudwigModule
Adds serialization.py, which reads and writes Ludwig modules to .h5 files.
Implements serialization for text encoders
Modifies input feature schema to allow encoder name or URL (schema_utils.StringOptionsOrURL)
Load pretrained encoders before preprocessing, re-use preprocessing data (vocab) from pretrained
[Temporarily] passes all encoder parameters to superclass constructor to save in a dictionary. This code will be unnecessary once https://github.com/ludwig-ai/ludwig/pull/2269 is in.
Adds serialization_test.py with transfer learning example in test_transfer_learning

Jul 14 '22 20:07 dantreiman

        6 files       6 suites 2h 41m 6s :stopwatch:   3 397 tests 2 505 :heavy_check_mark:   78 :zzz:   814 :x: 10 191 runs 7 492 :heavy_check_mark: 257 :zzz: 2 442 :x:

For more details on these failures, see this check.

Results for commit 723989ae.

:recycle: This comment has been updated with latest results.

Jul 14 '22 21:07 github-actions[bot]