ludwig icon indicating copy to clipboard operation
ludwig copied to clipboard

Comprehensive configs: Explicitly list and save all parameter values for input and output features in configs.

Open justinxzhao opened this issue 2 years ago • 2 comments

This helps ensure that the loading of old models are robust to changes to default parameter values.

Titanic example.

Before:

    'input_features': [   {   'column': 'Pclass',
                              'encoder': {'type': 'dense'},
                              'name': 'Pclass',
                              'proc_column': 'Pclass_mZFLky',
                              'tied': None,
                              'type': 'category'},
                          {   'column': 'Sex',
                              'encoder': {'type': 'dense'},
                              'name': 'Sex',
                              'proc_column': 'Sex_mZFLky',
                              'tied': None,
                              'type': 'category'},
                          {   'column': 'Age',
                              'encoder': {'type': 'passthrough'},
                              'name': 'Age',
                              'preprocessing': {   'missing_value_strategy': 'fill_with_mean'},
                              'proc_column': 'Age_DF6VxJ',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'SibSp',
                              'encoder': {'type': 'passthrough'},
                              'name': 'SibSp',
                              'proc_column': 'SibSp_mZFLky',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'Parch',
                              'encoder': {'type': 'passthrough'},
                              'name': 'Parch',
                              'proc_column': 'Parch_mZFLky',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'Fare',
                              'encoder': {'type': 'passthrough'},
                              'name': 'Fare',
                              'preprocessing': {   'missing_value_strategy': 'fill_with_mean'},
                              'proc_column': 'Fare_DF6VxJ',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'Embarked',
                              'encoder': {'type': 'dense'},
                              'name': 'Embarked',
                              'proc_column': 'Embarked_mZFLky',
                              'tied': None,
                              'type': 'category'}],

After:

    'input_features': [   {   'column': 'Pclass',
                              'encoder': {   'dropout': 0.0,
                                             'embedding_initializer': None,
                                             'embedding_size': 50,
                                             'embeddings_on_cpu': False,
                                             'embeddings_trainable': True,
                                             'pretrained_embeddings': None,
                                             'type': 'dense',
                                             'vocab': None},
                              'name': 'Pclass',
                              'preprocessing': {   'computed_fill_value': '<UNK>',
                                                   'fill_value': '<UNK>',
                                                   'lowercase': False,
                                                   'missing_value_strategy': 'fill_with_const',
                                                   'most_common': 10000},
                              'proc_column': 'Pclass_mZFLky',
                              'tied': None,
                              'type': 'category'},
                          {   'column': 'Sex',
                              'encoder': {   'dropout': 0.0,
                                             'embedding_initializer': None,
                                             'embedding_size': 50,
                                             'embeddings_on_cpu': False,
                                             'embeddings_trainable': True,
                                             'pretrained_embeddings': None,
                                             'type': 'dense',
                                             'vocab': None},
                              'name': 'Sex',
                              'preprocessing': {   'computed_fill_value': '<UNK>',
                                                   'fill_value': '<UNK>',
                                                   'lowercase': False,
                                                   'missing_value_strategy': 'fill_with_const',
                                                   'most_common': 10000},
                              'proc_column': 'Sex_mZFLky',
                              'tied': None,
                              'type': 'category'},
                          {   'column': 'Age',
                              'encoder': {'type': 'passthrough'},
                              'name': 'Age',
                              'preprocessing': {   'computed_fill_value': 0.0,
                                                   'fill_value': 0.0,
                                                   'missing_value_strategy': 'fill_with_mean',
                                                   'normalization': None},
                              'proc_column': 'Age_DF6VxJ',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'SibSp',
                              'encoder': {'type': 'passthrough'},
                              'name': 'SibSp',
                              'preprocessing': {   'computed_fill_value': 0.0,
                                                   'fill_value': 0.0,
                                                   'missing_value_strategy': 'fill_with_const',
                                                   'normalization': None},
                              'proc_column': 'SibSp_mZFLky',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'Parch',
                              'encoder': {'type': 'passthrough'},
                              'name': 'Parch',
                              'preprocessing': {   'computed_fill_value': 0.0,
                                                   'fill_value': 0.0,
                                                   'missing_value_strategy': 'fill_with_const',
                                                   'normalization': None},
                              'proc_column': 'Parch_mZFLky',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'Fare',
                              'encoder': {'type': 'passthrough'},
                              'name': 'Fare',
                              'preprocessing': {   'computed_fill_value': 0.0,
                                                   'fill_value': 0.0,
                                                   'missing_value_strategy': 'fill_with_mean',
                                                   'normalization': None},
                              'proc_column': 'Fare_DF6VxJ',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'Embarked',
                              'encoder': {   'dropout': 0.0,
                                             'embedding_initializer': None,
                                             'embedding_size': 50,
                                             'embeddings_on_cpu': False,
                                             'embeddings_trainable': True,
                                             'pretrained_embeddings': None,
                                             'type': 'dense',
                                             'vocab': None},
                              'name': 'Embarked',
                              'preprocessing': {   'computed_fill_value': '<UNK>',
                                                   'fill_value': '<UNK>',
                                                   'lowercase': False,
                                                   'missing_value_strategy': 'fill_with_const',
                                                   'most_common': 10000},
                              'proc_column': 'Embarked_mZFLky',
                              'tied': None,
                              'type': 'category'}],

Before:

    'output_features': [   {   'column': 'Survived',
                               'decoder': {'type': 'regressor'},
                               'dependencies': [],
                               'loss': {   'confidence_penalty': 0,
                                           'positive_class_weight': None,
                                           'robust_lambda': 0,
                                           'type': 'binary_weighted_cross_entropy',
                                           'weight': 1},
                               'name': 'Survived',
                               'preprocessing': {   'missing_value_strategy': 'drop_row'},
                               'proc_column': 'Survived_mZFLky',
                               'reduce_dependencies': 'sum',
                               'reduce_input': 'sum',
                               'threshold': 0.5,
                               'type': 'binary'}]

After:

    'output_features': [   {   'calibration': False,
                               'column': 'Survived',
                               'decoder': {   'bias_initializer': 'zeros',
                                              'fc_activation': 'relu',
                                              'fc_bias_initializer': 'zeros',
                                              'fc_dropout': 0.0,
                                              'fc_layers': None,
                                              'fc_norm': None,
                                              'fc_norm_params': None,
                                              'fc_output_size': 256,
                                              'fc_use_bias': True,
                                              'fc_weights_initializer': 'xavier_uniform',
                                              'input_size': None,
                                              'num_fc_layers': 0,
                                              'type': 'regressor',
                                              'use_bias': True,
                                              'weights_initializer': 'xavier_uniform'},
                               'dependencies': [],
                               'input_size': None,
                               'loss': {   'confidence_penalty': 0,
                                           'positive_class_weight': None,
                                           'robust_lambda': 0,
                                           'type': 'binary_weighted_cross_entropy',
                                           'weight': 1},
                               'name': 'Survived',
                               'num_classes': None,
                               'preprocessing': {   'missing_value_strategy': 'drop_row'},
                               'proc_column': 'Survived_mZFLky',
                               'reduce_dependencies': 'sum',
                               'reduce_input': 'sum',
                               'threshold': 0.5,
                               'type': 'binary'}]

Closes #2066

justinxzhao avatar Sep 06 '22 22:09 justinxzhao

Unit Test Results

         6 files  ±0         6 suites  ±0   3h 1m 19s :stopwatch: - 10m 31s   3 409 tests +3  3 331 :heavy_check_mark: +3    78 :zzz: ±0  0 :x: ±0  10 227 runs  +9  9 970 :heavy_check_mark: +9  257 :zzz: ±0  0 :x: ±0 

Results for commit ea4d3f73. ± Comparison against base commit 47dca75d.

:recycle: This comment has been updated with latest results.

github-actions[bot] avatar Sep 06 '22 22:09 github-actions[bot]

As part of this change, I've made a few changes to the OneOf, requiring that at most one field has allow_none=True.

    if default is None:
        # If the default is None, then this field allows none.
        allow_none = True

    fields_that_allow_none = [option for option in field_options if option.metadata["marshmallow_field"].allow_none]
    if len(fields_that_allow_none) > 1 and allow_none:
        raise ValueError(
            f"The governing OneOf has allow_none=True, but there are some field options that themselves "
            "allow_none=True, which is ambiguous for JSON validation. To maintain allow_none=True for the overall "
            "field, add allow_none=False to each of the field_options: "
            f"{[get_marshmallow_field_class_name(field) for field in fields_that_allow_none]}, and rely on the "
            "governing OneOf's allow_none=True to set the allow_none policy."
        )

    if fields_that_allow_none and not allow_none:
        raise ValueError(
            "The governing OneOf has allow_none=False, while None is permitted by the following field_options: "
            f"{[get_marshmallow_field_class_name(field) for field in fields_that_allow_none]}. This is contradictory. "
            "Please set allow_none=False for each field option to make this consistent."
        )

justinxzhao avatar Sep 19 '22 21:09 justinxzhao