splink icon indicating copy to clipboard operation
splink copied to clipboard

[MAINT] Settings restructuring - bringing everything into a centralised settings object

Open ThomasHepworth opened this issue 1 year ago • 1 comments

Is your proposal related to a problem?

This proposal builds upon the discussions in issue https://github.com/moj-analytical-services/splink/issues/1055.

Describe the solution you'd like

As Splink has expanded, some of our earlier decisions regarding the settings object have become problematic. Nick's comment in the issue above highlights some of these issues.

Additionally, code related to the settings object within the linker has gradually led to software bloat, making it challenging to manage our linker.

It would feel more natural if everything related to the settings object, aside from explicitly loading a new settings dictionary, were offloaded to and managed by (a) separate settings class(es).

These separate classes should handle validation, exceptions, reading, writing, while also housing all the information from our settings dictionary.

To summarise some of the main changes necessary:

  • Both the Settings dictionary and settings object currently reside within the linker class, primarily because of the UID. However, it would be more natural for the settings object to encompass the settings dictionary. Most uses of the settings dictionary could (and in my opinion should) be handled through a series of settings classes.
  • Exclusive management of the cache UID should also be within our settings object. This would reduce complexity by eliminating the need for two distinct setters and properties within our settings object and linker.
  • Error handling and settings read/write functionality should be contained within another class. It's acceptable to call these methods from within the linker, but they should ideally exist as methods within the main settings object class.
  • Validation should also occur within our SettingsObj class. This would make the validation steps more explicit and provide easier access to the information we need without concerns about missing specific information.
  • Currently, both the _settings_obj_ and _settings_dict attributes exist on the main linker class. This redundancy feels unnecessary.

Additional context

I'll provide an initial structure idea when I have the time to step back, think, and set up a small test environment.

ThomasHepworth avatar Oct 24 '23 19:10 ThomasHepworth

Since we are writing back to the settings object through Splink, we may want to consider using __slots__ to prevent accidentally writing to new attributes, instead of existing ones.

ThomasHepworth avatar Oct 25 '23 16:10 ThomasHepworth

Settings has been split apart a bit in Splink4. We may wish to review this again (probably more to be done), but i think we'd want a separate issue for this

RobinL avatar Jul 24 '24 18:07 RobinL