ludwig Comprehensive configs: Explicitly list and save all parameter values for input and output features in configs.

This helps ensure that the loading of old models are robust to changes to default parameter values.

Titanic example.

Before:

    'input_features': [   {   'column': 'Pclass',
                              'encoder': {'type': 'dense'},
                              'name': 'Pclass',
                              'proc_column': 'Pclass_mZFLky',
                              'tied': None,
                              'type': 'category'},
                          {   'column': 'Sex',
                              'encoder': {'type': 'dense'},
                              'name': 'Sex',
                              'proc_column': 'Sex_mZFLky',
                              'tied': None,
                              'type': 'category'},
                          {   'column': 'Age',
                              'encoder': {'type': 'passthrough'},
                              'name': 'Age',
                              'preprocessing': {   'missing_value_strategy': 'fill_with_mean'},
                              'proc_column': 'Age_DF6VxJ',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'SibSp',
                              'encoder': {'type': 'passthrough'},
                              'name': 'SibSp',
                              'proc_column': 'SibSp_mZFLky',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'Parch',
                              'encoder': {'type': 'passthrough'},
                              'name': 'Parch',
                              'proc_column': 'Parch_mZFLky',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'Fare',
                              'encoder': {'type': 'passthrough'},
                              'name': 'Fare',
                              'preprocessing': {   'missing_value_strategy': 'fill_with_mean'},
                              'proc_column': 'Fare_DF6VxJ',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'Embarked',
                              'encoder': {'type': 'dense'},
                              'name': 'Embarked',
                              'proc_column': 'Embarked_mZFLky',
                              'tied': None,
                              'type': 'category'}],

After:

    'input_features': [   {   'column': 'Pclass',
                              'encoder': {   'dropout': 0.0,
                                             'embedding_initializer': None,
                                             'embedding_size': 50,
                                             'embeddings_on_cpu': False,
                                             'embeddings_trainable': True,
                                             'pretrained_embeddings': None,
                                             'type': 'dense',
                                             'vocab': None},
                              'name': 'Pclass',
                              'preprocessing': {   'computed_fill_value': '<UNK>',
                                                   'fill_value': '<UNK>',
                                                   'lowercase': False,
                                                   'missing_value_strategy': 'fill_with_const',
                                                   'most_common': 10000},
                              'proc_column': 'Pclass_mZFLky',
                              'tied': None,
                              'type': 'category'},
                          {   'column': 'Sex',
                              'encoder': {   'dropout': 0.0,
                                             'embedding_initializer': None,
                                             'embedding_size': 50,
                                             'embeddings_on_cpu': False,
                                             'embeddings_trainable': True,
                                             'pretrained_embeddings': None,
                                             'type': 'dense',
                                             'vocab': None},
                              'name': 'Sex',
                              'preprocessing': {   'computed_fill_value': '<UNK>',
                                                   'fill_value': '<UNK>',
                                                   'lowercase': False,
                                                   'missing_value_strategy': 'fill_with_const',
                                                   'most_common': 10000},
                              'proc_column': 'Sex_mZFLky',
                              'tied': None,
                              'type': 'category'},
                          {   'column': 'Age',
                              'encoder': {'type': 'passthrough'},
                              'name': 'Age',
                              'preprocessing': {   'computed_fill_value': 0.0,
                                                   'fill_value': 0.0,
                                                   'missing_value_strategy': 'fill_with_mean',
                                                   'normalization': None},
                              'proc_column': 'Age_DF6VxJ',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'SibSp',
                              'encoder': {'type': 'passthrough'},
                              'name': 'SibSp',
                              'preprocessing': {   'computed_fill_value': 0.0,
                                                   'fill_value': 0.0,
                                                   'missing_value_strategy': 'fill_with_const',
                                                   'normalization': None},
                              'proc_column': 'SibSp_mZFLky',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'Parch',
                              'encoder': {'type': 'passthrough'},
                              'name': 'Parch',
                              'preprocessing': {   'computed_fill_value': 0.0,
                                                   'fill_value': 0.0,
                                                   'missing_value_strategy': 'fill_with_const',
                                                   'normalization': None},
                              'proc_column': 'Parch_mZFLky',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'Fare',
                              'encoder': {'type': 'passthrough'},
                              'name': 'Fare',
                              'preprocessing': {   'computed_fill_value': 0.0,
                                                   'fill_value': 0.0,
                                                   'missing_value_strategy': 'fill_with_mean',
                                                   'normalization': None},
                              'proc_column': 'Fare_DF6VxJ',
                              'tied': None,
                              'type': 'number'},
                          {   'column': 'Embarked',
                              'encoder': {   'dropout': 0.0,
                                             'embedding_initializer': None,
                                             'embedding_size': 50,
                                             'embeddings_on_cpu': False,
                                             'embeddings_trainable': True,
                                             'pretrained_embeddings': None,
                                             'type': 'dense',
                                             'vocab': None},
                              'name': 'Embarked',
                              'preprocessing': {   'computed_fill_value': '<UNK>',
                                                   'fill_value': '<UNK>',
                                                   'lowercase': False,
                                                   'missing_value_strategy': 'fill_with_const',
                                                   'most_common': 10000},
                              'proc_column': 'Embarked_mZFLky',
                              'tied': None,
                              'type': 'category'}],

Before:

    'output_features': [   {   'column': 'Survived',
                               'decoder': {'type': 'regressor'},
                               'dependencies': [],
                               'loss': {   'confidence_penalty': 0,
                                           'positive_class_weight': None,
                                           'robust_lambda': 0,
                                           'type': 'binary_weighted_cross_entropy',
                                           'weight': 1},
                               'name': 'Survived',
                               'preprocessing': {   'missing_value_strategy': 'drop_row'},
                               'proc_column': 'Survived_mZFLky',
                               'reduce_dependencies': 'sum',
                               'reduce_input': 'sum',
                               'threshold': 0.5,
                               'type': 'binary'}]

After:

    'output_features': [   {   'calibration': False,
                               'column': 'Survived',
                               'decoder': {   'bias_initializer': 'zeros',
                                              'fc_activation': 'relu',
                                              'fc_bias_initializer': 'zeros',
                                              'fc_dropout': 0.0,
                                              'fc_layers': None,
                                              'fc_norm': None,
                                              'fc_norm_params': None,
                                              'fc_output_size': 256,
                                              'fc_use_bias': True,
                                              'fc_weights_initializer': 'xavier_uniform',
                                              'input_size': None,
                                              'num_fc_layers': 0,
                                              'type': 'regressor',
                                              'use_bias': True,
                                              'weights_initializer': 'xavier_uniform'},
                               'dependencies': [],
                               'input_size': None,
                               'loss': {   'confidence_penalty': 0,
                                           'positive_class_weight': None,
                                           'robust_lambda': 0,
                                           'type': 'binary_weighted_cross_entropy',
                                           'weight': 1},
                               'name': 'Survived',
                               'num_classes': None,
                               'preprocessing': {   'missing_value_strategy': 'drop_row'},
                               'proc_column': 'Survived_mZFLky',
                               'reduce_dependencies': 'sum',
                               'reduce_input': 'sum',
                               'threshold': 0.5,
                               'type': 'binary'}]

Closes #2066

Sep 06 '22 22:09 justinxzhao

Unit Test Results

        6 files ±0       6 suites ±0 3h 1m 19s :stopwatch: - 10m 31s   3 409 tests +3 3 331 :heavy_check_mark: +3   78 :zzz: ±0 0 :x: ±0 10 227 runs +9 9 970 :heavy_check_mark: +9 257 :zzz: ±0 0 :x: ±0

Results for commit ea4d3f73. ± Comparison against base commit 47dca75d.

:recycle: This comment has been updated with latest results.

Sep 06 '22 22:09 github-actions[bot]

As part of this change, I've made a few changes to the OneOf, requiring that at most one field has allow_none=True.

    if default is None:
        # If the default is None, then this field allows none.
        allow_none = True

    fields_that_allow_none = [option for option in field_options if option.metadata["marshmallow_field"].allow_none]
    if len(fields_that_allow_none) > 1 and allow_none:
        raise ValueError(
            f"The governing OneOf has allow_none=True, but there are some field options that themselves "
            "allow_none=True, which is ambiguous for JSON validation. To maintain allow_none=True for the overall "
            "field, add allow_none=False to each of the field_options: "
            f"{[get_marshmallow_field_class_name(field) for field in fields_that_allow_none]}, and rely on the "
            "governing OneOf's allow_none=True to set the allow_none policy."
        )

    if fields_that_allow_none and not allow_none:
        raise ValueError(
            "The governing OneOf has allow_none=False, while None is permitted by the following field_options: "
            f"{[get_marshmallow_field_class_name(field) for field in fields_that_allow_none]}. This is contradictory. "
            "Please set allow_none=False for each field option to make this consistent."
        )

Sep 19 '22 21:09 justinxzhao

ludwig ludwig copied to clipboard

Comprehensive configs: Explicitly list and save all parameter values for input and output features in configs.

Unit Test Results

ludwig
ludwig copied to clipboard