splink
splink copied to clipboard
[MAINT] Settings restructuring - bringing everything into a centralised settings object
Is your proposal related to a problem?
This proposal builds upon the discussions in issue https://github.com/moj-analytical-services/splink/issues/1055.
Describe the solution you'd like
As Splink has expanded, some of our earlier decisions regarding the settings object have become problematic. Nick's comment in the issue above highlights some of these issues.
Additionally, code related to the settings object within the linker has gradually led to software bloat, making it challenging to manage our linker.
It would feel more natural if everything related to the settings object, aside from explicitly loading a new settings dictionary, were offloaded to and managed by (a) separate settings class(es).
These separate classes should handle validation, exceptions, reading, writing, while also housing all the information from our settings dictionary.
To summarise some of the main changes necessary:
- Both the Settings dictionary and settings object currently reside within the linker class, primarily because of the UID. However, it would be more natural for the settings object to encompass the settings dictionary. Most uses of the settings dictionary could (and in my opinion should) be handled through a series of settings classes.
- Exclusive management of the cache UID should also be within our settings object. This would reduce complexity by eliminating the need for two distinct setters and properties within our settings object and linker.
- Error handling and settings read/write functionality should be contained within another class. It's acceptable to call these methods from within the linker, but they should ideally exist as methods within the main settings object class.
- Validation should also occur within our SettingsObj class. This would make the validation steps more explicit and provide easier access to the information we need without concerns about missing specific information.
- Currently, both the
_settings_obj_
and_settings_dict
attributes exist on the main linker class. This redundancy feels unnecessary.
Additional context
I'll provide an initial structure idea when I have the time to step back, think, and set up a small test environment.
Since we are writing back to the settings object through Splink, we may want to consider using __slots__
to prevent accidentally writing to new attributes, instead of existing ones.
Settings has been split apart a bit in Splink4. We may wish to review this again (probably more to be done), but i think we'd want a separate issue for this