signac-flow
signac-flow copied to clipboard
Refactor Operation and Group registration with FlowProject
Feature description
The internals of FlowProject's registation of BaseFlowOperations and FlowGroups have multiple inconsistencies. Directives are stored as object attributes that are added to the callable operation. However, pre and post conditions are stored as list under dicts with callables as keys in the class attribute _OPERATION_PRE_CONDITIONS. The internal _FlowProjectClass metaclass says that operations are stored in dictionaries, but the actual storage is an associated list. Multiple class attributes provided by the metaclass are also unused throughout the project.
Refactoring this logic into a coherent structure that allows for extending the capabilities easily while reducing the complexity or at least centralizing it would be worth the effort.
Proposed solution
I believe that the creation of Registry objects that hold lists of RegistryItems would be an abstract and extendible way to complete this task (the names are just for the example). Doing this would prevent us from having to add attributes to external functions, classes, and objects and allow a single point for new features and finding bugs. Ways of handling aggregation instances from the class hierarchy and converting RegistryItems to the final type upon instantiation a FlowProject object could also be put into the Registry class.
Another more modest solution would just be to create the idea of RegistryItems contained in standard python containers (specifically dicts). The aggregation and conversion would then still be handled by the FlowProject.
Part of this (namely the duplication in registry objects) is handled by #251.
@b-butler I am having a bit trouble understanding your proposed solution. Could you maybe break this down a bit further and/or create some visual like a class diagram or so? While the current approach is certainly inconsistent, one thing to keep in mind is that certain attributes are attached to the function on purpose (like directives) because they were (at least used to be) tightly coupled to the operation (function) itself, while others only make sense in the context of a FlowProject class.
@csadorf I was thinking of something like this structure where the only relationship is ownership (not inheritance). Names are just placeholders, and not meant to be the final names necessarily.
_FlowProjectClass ----- FlowOperationRegistry ------ FlowOperationRegistryItem
|
----- FlowGroupRegistry ----- FlowGroupRegistryItem
We could have abstract classes Registry and RegistryItem to define the structure and form of these classes. I believe the RegistryItem idea here is more important. The Registry concept can be done (and currently) is using base Python types (or those found in the standard library). The one of the nicety that the Registry classes could provide is the aggregation of items from the __mro__ which would abstract that logic outside of FlowProject. RegistryItem's would handle all the information necessary to create the object they register; they also could provide convenience methods that determine if another RegistryItem is conflicting (e.g. two operations with the same name).
That is the idea as it currently stands. I do not think it is necessary that this approach is taken if another one that addresses the issues is preferred.
Edit: Fixed typo in metaclass name.
So you would replace the _FlowProjectClass metaclass with registry attributes that collect across the MRO?
I would keep the metaclass (we need it for the class level attribute automatic creation). I would just have store these registry attributes. I realized I made a mistake in the meta class name and just called it the FlowProject. I have corrected that.