splink icon indicating copy to clipboard operation
splink copied to clipboard

[FEAT] Add training rules to settings object

Open RossKen opened this issue 1 year ago • 1 comments

Is your proposal related to a problem?

Currently the settings object does not make a model fully reproducible (i.e. re-trainable). The settings json can save out the blocking rules, comparisons, m/u values etc that make up a model, but not the model training steps that have made them.

Describe the solution you'd like

Add sections to the settings object to define lambda, m and u training rules (similar to blocking rules). These can be specified (and run) at the instantiation step, or if the user iteratively runs model training - these should be added to the settings object under the hood.

Describe alternatives you've considered

Additional context

RossKen avatar Jun 05 '23 09:06 RossKen

I’ve wondered about adding the estimation logic to the settings object before and I think it’s a good idea in principle

To explain why it's not there already, it's because are a few challenges, some of which could probably be overcome, but make things tricky. Overall it's because model training is to some extent imperative rather than declarative

  1. If you train using labels, you couldn’t easily capture that in the settings object
  2. The order of training matters (not that much, but a bit), so the settings dict would have to preserve order
  3. There are some subtle settings which could be difficult to encode. For example, an edge case would be a user who decided to fix a m or u values mid-training.
  4. The settings dict would need to remember all parameters passed into the model e.g. the starting values of m, whether to train using the new optimised non-tf adjusted EM approach, etc

RobinL avatar Jun 30 '23 14:06 RobinL

Converting to a discussion because still unsure whether this is desirable

RobinL avatar Jul 24 '24 18:07 RobinL