ludwig
ludwig copied to clipboard
Encoder refactor
PR to refactor Encoder/Decoder section of input/output features so that it is nested one level deeper. For example, an input feature that originally looked like this:
input_features:
- name: image_feature_1
type: image
preprocessing:
height: 7.5
width: 7.5
num_channels": 4
encoder: resnet
num_channels: 4
dropout: 0.1
resnet_size: 100
should now be defined like this:
input_features:
- name: image_feature_1
type: image
preprocessing:
height: 7.5
width: 7.5
num_channels": 4
encoder:
type: resnet
num_channels: 4
dropout: 0.1
resnet_size: 100
I have added some backwards compatibility logic and a test for it as well. Also since it is such a large PR, if you are an expert in a specific area of the codebase and don't have time to do the whole thing, feel free to just inject your expertise where it works best for you!
Unit Test Results
3 files - 3 3 suites - 3 55m 26s :stopwatch: - 1h 38m 39s 2 838 tests - 124 2 683 :heavy_check_mark: - 230 56 :zzz: + 7 99 :x: + 99 8 514 runs - 372 8 014 :heavy_check_mark: - 689 204 :zzz: +21 296 :x: +296
For more details on these failures, see this check.
Results for commit df823d01. ± Comparison against base commit f145e467.
:recycle: This comment has been updated with latest results.
Are these nesting changes being flowed down to the hyperopt settings when applied at the feature level?
Eg if we were to add sparse
and dense
choice for hyper opt for category features would it be like this:
input_features:
- name: genres
type: set
- name: content_rating
type: category
encoder:
type: dense
- name: top_critic
type: binary
- name: runtime
type: number
- name: review_content
type: text
encoder: embed
output_features:
- name: recommended
type: binary
hyperopt:
goal: maximize
output_feature: recommended
metric: accuracy
split: validation
parameters:
review_content.embedding_size:
space: choice
categories: [128, 256]
review_content.encoder:
type:
space: choice
categories:[sparse, dense]
Or like this:
hyperopt:
goal: maximize
output_feature: recommended
metric: accuracy
split: validation
parameters:
review_content.embedding_size:
space: choice
categories: [128, 256]
review_content.encoder.type:
space: choice
categories:[sparse, dense]
It would be the latter actually. Was actually testing this out today specifically in test_hyperopt_with_shared_params
.
Apologies for the delay, found an issue with backwards compatibility so went back to fix that. Should be good to review now.
Overall this is great - awesome work 👏 ! I've left a few minor comments on some potential refactoring but those are mostly nits.
Will we be replacing dictionary accesses with the schema class in a follow up PR?
@jppgks Yes, I am currently writing the config object which joins all the schemas together into the internal replacement for the config in a separate PR.
Can this be closed now that #2370 is merged? @connor-mccorm @dantreiman
Indeed it can! @jppgks