splink
splink copied to clipboard
[FEAT] Add training rules to settings object
Is your proposal related to a problem?
Currently the settings object does not make a model fully reproducible (i.e. re-trainable). The settings json can save out the blocking rules, comparisons, m/u values etc that make up a model, but not the model training steps that have made them.
Describe the solution you'd like
Add sections to the settings object to define lambda, m and u training rules (similar to blocking rules). These can be specified (and run) at the instantiation step, or if the user iteratively runs model training - these should be added to the settings object under the hood.
Describe alternatives you've considered
Additional context
I’ve wondered about adding the estimation logic to the settings object before and I think it’s a good idea in principle
To explain why it's not there already, it's because are a few challenges, some of which could probably be overcome, but make things tricky. Overall it's because model training is to some extent imperative rather than declarative
- If you train using labels, you couldn’t easily capture that in the settings object
- The order of training matters (not that much, but a bit), so the settings dict would have to preserve order
- There are some subtle settings which could be difficult to encode. For example, an edge case would be a user who decided to fix a m or u values mid-training.
- The settings dict would need to remember all parameters passed into the model e.g. the starting values of m, whether to train using the new optimised non-tf adjusted EM approach, etc
Converting to a discussion because still unsure whether this is desirable